Planet Linux Australia
Celebrating Australians & Kiwis in the Linux and Free/Open-Source community...

January 22, 2020

linux.conf.au 2020 recap

It's that time of year again. Most of OzLabs headed up to the Gold Coast for linux.conf.au 2020.

linux.conf.au is one of the longest-running community-led Linux and Free Software events in the world, and attracts a crowd from Australia, New Zealand and much further afield. OzLabbers have been involved in LCA since the very beginning and this year was no exception with myself running the Kernel Miniconf and several others speaking.

The list below contains some of our highlights that we think you should check out. This is just a few of the talks that we managed to make it to - there's plenty more worthwhile stuff on the linux.conf.au YouTube channel.

We'll see you all at LCA2021 right here in Canberra...

Keynotes

A couple of the keynotes really stood out:

Sean is a forensic structural engineer who shows us a variety of examples, from structural collapses and firefighting disasters, where trained professionals were blinded by their expertise and couldn't bring themselves to do things that were obvious.

There's nothing quite like cryptography proofs presented to a keynote audience at 9:30 in the morning. Vanessa goes over the issues with electronic voting systems in Australia, and especially internet voting as used in NSW, including flaws in their implementation of cryptographic algorithms. There continues to be no good way to do internet voting, but with developments in methodologies like risk-limiting audits there may be reasonably safe ways to do in-person electronic voting.

OpenPOWER

There was an OpenISA miniconf, co-organised by none other than Hugh Blemings of the OpenPOWER Foundation.

Anton (on Mikey's behalf) introduces the Power OpenISA and the Microwatt FPGA core which has been released to go with it.

Anton live demos Microwatt in simulation, and also tries to synthesise it for his FPGA but runs out of time...

Paul presents an in-depth overview of the design of the Microwatt core.

Kernel

There were quite a few kernel talks, both in the Kernel Miniconf and throughout the main conference. These are just some of them:

There's been many cases where we've introduced a syscall only to find out later on that we need to add some new parameters - how do we make our syscalls extensible so we can add new parameters later on without needing to define a whole new syscall, while maintaining both forward and backward compatibility? It turns out it's pretty simple but needs a few more kernel helpers.

There are a bunch of tools out there which you can use to make your kernel hacking experience much more pleasant. You should use them.

Among other security issues with container runtimes, using procfs to setup security controls during the startup of a container is fraught with hilarious problems, because procfs and the Linux filesystem API aren't really designed to do this safely, and also have a bunch of amusing bugs.

Control Flow Integrity is a technique for restricting exploit techniques that hijack a program's control flow (e.g. by overwriting a return address on the stack (ROP), or overwriting a function pointer that's used in an indirect jump). Kees goes through the current state of CFI supporting features in hardware and what is currently available to enable CFI in the kernel.

Linux has supported huge pages for many years, which has significantly improved CPU performance. However, the huge page mechanism was driven by hardware advancements and is somewhat inflexible, and it's just as important to consider software overhead. Matthew has been working on supporting more flexible "large pages" in the page cache to do just that.

Spoiler: the magical fantasy land is a trap.

Community

Lots of community and ethics discussion this year - one talk which stood out to me:

Bradley and Karen argue that while open source has "won", software freedom has regressed in recent years, and present their vision for what modern, pragmatic Free Software activism should look like.

Other

Among the variety of other technical talks at LCA...

Quantum compilers are not really like regular classical compilers (indeed, they're really closer to FPGA synthesis tools). Matthew talks through how quantum compilers map a program on to IBM's quantum hardware and the types of optimisations they apply.

Clevis and Tang provide an implementation of "network bound encryption", allowing you to magically decrypt your secrets when you are on a secure network with access to the appropriate Tang servers. This talk outlines use cases and provides a demonstration.

Christoph discusses how to deal with the hardware and software limitations that make it difficult to capture traffic at wire speed on fast fibre networks.

January 19, 2020

Annual Penguin Picnic, January 25, 2020

Jan 25 2020 12:00
Jan 25 2020 16:00
Jan 25 2020 12:00
Jan 25 2020 16:00
Location: 
Yarra Bank Reserve, Hawthorn

The Linux Users of Victoria Annual Penguin Picnic will be held on Saturday, January 25, starting at 12 noon at the Yarra Bank Reserve, Hawthorn.  In the event of hazardous levels of smoke or other dangerous weather, we will announce an alternate indoor location.

LUV would like to acknowledge Infoxchange for the Richmond venue.

Linux Users of Victoria Inc., is a subcommitee of Linux Australia.

January 25, 2020 - 12:00

read more

LUV February 2020 Workshop: TBA

Feb 15 2020 12:30
Feb 15 2020 16:30
Feb 15 2020 12:30
Feb 15 2020 16:30
Location: 
Infoxchange, 33 Elizabeth St. Richmond

Topic to be announced

Please email the LUV committee at luv-ctte@luv.asn.au if you would like to give a talk, presentation or workshop or have a topic you would like to see covered.

The meeting will be held at Infoxchange, 33 Elizabeth St. Richmond 3121.  Late arrivals please call (0421) 775 358 for access to the venue.

LUV would like to acknowledge Infoxchange for the venue.

Linux Users of Victoria is a subcommittee of Linux Australia.

February 15, 2020 - 12:30

read more

January 17, 2020

Linux.conf.au 2020 – Friday – Lightning Talks and Close

Steve

  • Less opportunity for Intern type stuff
  • Trying to build team with young people
  • Internships
  • They Need opportunities
  • Think about giving a chance

Martin

  • Secure Scuttlebutt
  • p2p social web
  • more like just a protocol
  • scuttlebutt.nz
  • Protocol used for other stuff.

Emma

  • LCA from my perspective

Mike Bailey

  • Pipe-skimming
  • Enahncing UI of CLI tools
  • take first arg in pipe and sends to the next tool

Aleks

  • YOGA Book c930
  • Laptop with e-ink display for keyboard
  • Used wireshark to look at USB under Windows
  • Created a device driver based on packets windows was sending
  • Linux recognised it as a USB Keyboard and just works
  • Added new feature and
  • github.com/aleksb

Evan

  • Two factor authentication
  • It’s hard

Keith

  • Snekboard
  • Crowdsourced hardware project
  • crowdsupply.com/keith-packard/snekboard
  • $79 campaign, ends 1 March

Adam and Ben

  • idntfrs
  • bytes are not expensive any more

William

  • Root cause of swiss cheese

Colin

  • OWASP
  • Every person they taught about a vulnerbility 2 people appeared to write vulnerable code
  • WebGoat
  • Hold you hand though OWASP vulnerability list. Exploit and fix
  • teaching, playing to break, go back and fix
  • Forks in various languages

Leigh

  • Masculinity
  • Leave it better than you found it

David

  • Fixing NAT
  • with more NAT

Caitlin

  • Glitter!
  • conferences should be playful
  • meetups can be friendly
  • Ways to introduce job
  • Stickers

Miles

  • Lies, Damn lies and data science
  • Hipster statistics
  • LCA 2021 is in Canberra

Share

Linux.conf.au 2020 – Friday – Session 1 – Protocols / LumoSQL

The Fight to Keep the Watchers at Bay – Mark Nottingham

Disclaimer: I am not a security person, But in some sense we are all security people.

Why Secure the Internet

  • In the beginning it was just researchers and a Academics
  • Snowden was a watershed moment
  • STRINT Workshop in 2014
  • It’s not just your website, it’s the Javascript that somebody in injecting in front of it.

What has happened so far?

  • http -> https
    • In 2010 even major services, demo of firesheep program to grab cookies and auth off Wifi
    • Injecting cookies in http flows
    • Needed to shift needle to https
    • http/2 big push to make encrypted-only , isn’t actually though browsers only support https.
    • “Secure Contexts” cool features only https
  • Problem: Mixed Content
    • “Upgrading Insecure Requests” allow ad-hoc by pages
    • HTTPs is slow – istlsfastyet.com
    • Improvement in speed of implimentations
    • Let’s Encrypt
  • Around 85-90% https as of Early 2020
  • Some people were unhappy
    • Slow Satellite internet said they needed middle boxes to optimise http over slow links
    • People who did http shared caching
  • TLS 1.2 -> TLS 1.3
    • Complex old protocol
    • Implementation monculture
    • Outdated Crypto
    • TLS 1.3
      • Simplify where possible
      • encrypt most of handshake
      • get good review of protocol
      • At around 30%
      • Lots of implementations
    • Some unhappy. Financial institutions needed to sniff secure transactions (and had bought expensive appliances to do this)
      • They ended up forkign their own protocol
  • TCP -> QUIC
    • TCP is unencrypted, lots of leaks and room for in-betweens to play around
    • QUIC – all encrypted
    • Spin Bit – single bit of data can be used by providers to estimate packet loss and delay.
  • DNS -> DOH
    • Lots of click data sold by ISPs
    • Countries hijacking DNS by countries to block stuff
    • DNS over https co be co-located by a popular website
    • Some were unhappy
      • Lots of pushback from governments and big companies
      • Industry unhappy about concentration of DNS handling
      • Have to decide who to trust
  • SNI -> Encrypted SNI
    • Working progress, very complex
    • South Korea unhappy, was using it to block people
  • Traffic Analysis
    • Packet length, frequency, destinations
    • TOR hard to tell. Looking at using multiplexing and fix-length records
  • But the ends
    • Customer compromised or provider compromised (or otherwise sharing data)
  • Observations
    • Cost and Control
      • Cost: Big technology spends no obsolete
      • Control: some people want to do stuff on the network
    • We have to design tthe Internet to the pessimistic case
    • You can’t expose application data to the path anymore
    • Well-defined interfaces and counterbalanced roles
    • Technology and Policy need to work togeather and keep each other in check
    • Making some people unhappy means you need some guiding principles

LumoSQL – updating SQLite for the modern age – Dan Shearer

LumoSQL = SQLite + LMDB – WAL

SQLite

  • ” Is a replacement for fopen() “
  • Key/Value stores.
    • Everyone used Sleepycat BDB – bought be Oracle and licensed changed
    • Many switched to LMDB (approx 2010)
  • Howard Chu 2013 SQLightning faster than SLQite but changes not adopted into SQLite

LumoSQL

  • Funded by NLNet Foundation
  • Dan Shearer and Keith Maxwell

What isn’t working with SQLite ?

  • Inappropriate/unsupported use cases
  • Speed
  • Corruption
  • Encryption

What hasn’t been done so far

  • Located code, started on github.com/LumoSQL
  • Benchmarking tool for versions matrix
  • Mapped out how the keywords store works
    • So different backend can be dropped in.
  • Fixed bugs with the port and with lmdb

What’s Next

  • First Release Feb 2020
  • Add Multiple backends
  • Implement two database advances

Share

January 16, 2020

Linux.conf.au 2020 – Thursday – Session 3 – Software Freedom lost / Stream Processing

Open Source Won, but Software Freedom Hasn’t Yet: A Guide & Commiseration Session for FOSS activists by Bradley M. Kuhn, Karen Sandler

Larger Events elsewhere tend to be corperate sponsored so probably wouldn’t accept a talk like this

Free Software Purists

  • About 2/3s of audenience spent some time going out of their way using free software
  • A few years ago you could only use free software
  • To watch TV. I can use DRM or I can pirate. Both are problems.
  • The web is a very effecient way to install proprietary software (javascript) on your browser
  • Most people don’t even see that or think about it

Laptops

  • 2010-era Laptops are some of the last that are fully free-software
  • Later have firmware and other stuff that is all closed.
  • HTC Dream – some firmware on phone bit but rest was free software

Electronic Coupons

  • Coupons are all Digital. You need to run an app that tracks all you processors
  • “As a Karen I sometimes ask the store to just ket me have the coupon, even though it is expired”
  • Couldn’t install Disneyland App on older phones. So unable to bypass lines etc.

Proprietary dumping ground

  • Bradly had a device. Installed all the proprietary apps on it rather than his main phone
  • But it’s a bad idea since all the tracking stuff can talk to each other.

Hypocrisy of tradition free software advocacy

  • Do not criticise people for use Proprietary software
  • It is it is almost impossible to live your life without use it
  • It should be an aspirational goal
  • Person should not be seen as a failure if they use it
  • Asking others to use it instead is worse than using it yourself
  • Karen’s Laptop: It runs Debian but it is only “98% free”

Paradox: There more FOSS there is, the less software freedom we actually have in our technology

  • But there is less software freedom than there is in 2006
  • Because everything is computerized, a lot more than 15 years ago.
  • More things in Linux that Big companies want in datacentres rather than tinkerers in their homes want.

What are the right choices?

  • Be mindful
  • Try when you can to use free software. Make small choices that support software freedom
  • Shine a light on the problem
  • Don’t let the shame you feel about using proprietary software paralyze you
  • and don’t let the problems we face overwhelm you into inaction
  • Re-prioritize your FOSS development time.
    • Is it going to give more people freedom in the world?
    • Maybe try to do a bit in your free time.
  • Support each other
  • FAIF.us podcast

Advanced Stream Processing on the Edge by Eduardo Silva

Data is everywhere. We need to be able to extract value from it

  • Put it all in a database to extract value
  • Challenge: Data comes from all sorts of places
    • More data -> more bandwidth -> more resource required
    • Delays as more data ingested
  • Challenge: lots of different formats

Ideal Tool

  • Collect from different sources
  • convert unstructured to structured
  • enrichment and filtering
  • multiple destinations like database or cloud services

Fluentbit

  • Started in 2015
  • Origins lightweight log processor for embedded space
  • Ended up being used in cloud space
  • Written in C
  • Low mem and CPU
  • Plugable arch
  • input -> parser -> filter -> buffer -> routing -> output

Structure Messages

  • Unstructured to structured
  • Metadata
  • Can add tags to date on input, use it later for routing

Stream processing

  • Perform processing while the data is still in motion
  • Faster data processing
  • in Memory
  • No tables
  • No indexing
  • Receive structured data, expose a query language
  • Nomally done centrally

Doing this on the edge

  • Offload computation from servers to data collectors
  • Only sends required data to the cloud
  • Use a SQL-like language to write the queries
  • Integrated with fluent core

Functions

  • Aggregation functions
  • Time funtiocs
  • Timeseries functions
  • You can also write functions in Lua

Also exposed prometheus-type metrics

Share

Linux.conf.au 2020 – Thursday – Session 2 – Origins of X / Aerial Photography

The History of X: Lessons for Software Freedom – Keith Packard

1984 – The Origins of X

  • Everything proprietary
  • Brian Reid and Paul Asente: V Kernel -> VGTS -> W window system
    • Ported to VAXstation 100 at Stanford
    • 68k processor, 128k of VRAM
    • B&W
  • Bob Scheifler started hacking W -> X
  • Ported to Unix , made more Unix Friendly (async) renamed X

Unix Workstation Market

  • Unix was closed source
  • Vendor Unix based on BSD 4.x
  • Sun, HP, Digital, Apollo, Tektronix, IBM
  • this was when the configure program happened
  • VAXstation II
    • Color graphics 8bit accelerated
  • Sun 3/60
    • CPU drew everything on the screen

Early Unix Window System – 85-86

  • SunView dominates (actual commerical apps, Ddesktop widgets)
  • Digital VMS/US
  • Apollo had Domain
  • Tektronix demonstrated SmallTalk
  • all only ran on their own hardware

X1 – X6

  • non-free software
  • Used Internally at MIT
  • Shared with friends informally

X10 – approx 1986

  • Almost usable
  • Ported to various workstations
  • Distribution was not all free software (had bin blobs)
    • Sun port relied on SunView kernel API
    • Digital provided binary rendering code
    • IBM PC/RT Support completed in source form

Why X11 ?

  • X10 had warts
  • rendering model was pretty terrible
  • External Windows manager without borders
  • Other vendors wanted to get involved
    • Jim Gettys and Smokey Wallace
    • Write X11, release under liberal terms
    • Working against Sun
    • Displace Sunview
    • “Reset the market”
    • Digital management agreed

X11 Development 1986-87

  • Protocol designed as croos-org team
  • Sample implementation done mostly at DEC WRL, collaboration with people at MIT
  • Internet not functional enough to property collaborate, done via mail
    • Thus most of it happened at MIT

MIT X Consortium

  • Hired dev team at MIT
  • Funded by consortium
  • Members also voted on standards
    • Members stopped their on develoment
    • Stopped collaboration with non-members
  • We knew Richard too well – The GPL’s worst sponsor
  • Corp sponsors dedicated to non-free software

X Consortium Standards

  • XIE – X Imaging Extensions
  • PIX – Phigs Extension for X
  • LBX – Low Bandwidth X
  • Xinput (version 1)

The workstation vendors were trying to differentiate. They wanted a minimal base to built their stuff on. Standard was frozen for around 15 years. That is why X fell behind other envs as hardware changed.

X11 , NeWs and Postscript

  • NeWS – Very slow but cool
  • Adobe adapted PostScript interpreter for windows systems – Closed Source
  • Merged X11/NeWS server – Closed Source

The Free Unix Desktop

  • All the toolkits were closed source
  • Sunview -> XView
  • OpenView – Xt based toolkit

X Stagnates – ~1992

  • Core protocol not allowed to change
  • non-members pushed out
  • market fragments

Collapse of Unix

  • The Decade of Windows

Opening a treasure trove: The Historical Aerial Photography project by Paul Haesler

  • Geoscience Australia has inherated an extensive archive of hisorical photography
  • 1.2 million images from 1920 – 1990s
  • Full coverage of Aus and more (some places more than others)

Historical Archive Projects

  • Canonical source of truth is pieces of paper
  • Multiple attempts at scanning/transscription. Duplication and compounding of errors
  • Some errors in original data
  • “Historian” role to sift through and collate into a machine-readable form – usually spreadsheets
  • Data Model typically evolves over time – implementation must be flexible and open-minded

What we get

  • Flight Line Diagrams (metadata)
  • Imagery (data)
  • Lots scanned in early 1990s, but low resolution and missing data, some missed

Digitization Pipeline

  • Flight line diagram pipeline
    • High resolution scans
    • Georeferences
  • Film pipeline
    • Filmstock
    • High Resolution scans
    • Georeference images
    • Georectified images
    • Stitched mosaics + Elevation models

Only about 20% of film scanned. Lacking funding and film deteriorating

Other states have similar smaller archives (and other countries)

  • Many significantly more mature but may be locked in propitiatory platforms

Stack

  • Open Data ( Cc by 4.0)
  • Open Standards (TESTful, GeoJSON, STAC)
  • Open Source
  • PostGreSQL/PostGIS
  • Python3: Django REST Framework
  • Current Status: API Only. Alpha/proof-of-concept

API

  • Search for Flight runs
  • Output is GeoJSON

Coming Next

  • Scanning and georeferencing (need $$$)
  • Data entry/management tools – no spreadsheets
  • Refs to other archives, federated search
  • Integration with TerriaJS/National Map
  • Full STAC once standardized

Share

Linux.conf.au 2020 – Thursday – Session 1 – .NET to Linux / Collecting information

Engineer tested, manager approved: Migrating Windows/.NET services to Linux – Katie Bell

Works at Campaign Monitor

  • sends email spam
  • Company around since 2004

Software product generations

  • Originally a monolith
  • Windows, C# .net framework, IIS, Monolithic SQLServer
  • Went to microservices (called Reckless Microservices)
  • Windows, C# .net , OWIN Hosting / Nancy , Modular databases

Gen 2 – “Reckless” Microservice

  • Easy to create a new microservices
  • and deploy etc
  • Runs in ec2

Wanted to go to a tools like dockers, kubernetes that were not well supported by microsoft tools

Gen 3 – Docker Services

  • Linux
  • Java / Go

Lots of ways to do stuff

  • 3 different ways of doing everything
  • Confusing and big tax on developers
  • Losing knowledge about how the older Reckless stuff worked

A Crazy Idea

  • Run all the Reckless services in docker
  • Get rid of one whole generation

What does it take?

  • Move from .NET Framework to .NET Core
  • Framework very Windows specific – runtime installed at OS level
  • Core more open and cross-platform – self contained executable apps
  • But what about Mono? (Open Source .NET Framework) .
    • Probably not worth the effort since Framework is the way forward
  • But a lot of .NET Framework APIs not ported over to .NET Core. Some replaced by new APIs
  • .Net Standard libraries support on both though, which is lots of them

What Doesn’t port to Core?

  • Libraries moved/renamed
  • Some libs dropped
  • IIS, ASP.NET replaced with ASP.NET Core + MVC
  • WCF Server communication
  • Old unmaintained libraries

Luckily Reckless not using ASP.NET so shouldn’t to too hard to do. Maybe not sure a crazy idea.

But most companies don’t let people spend lots of time on Tech Debt.

Asked for something small – 2 weeks of 3 people.

  • 1 week: Hacky proof of concept (getting 1 service to run in .NET Core)
  • 2nd week: Document and investigate what full project would require and have to do
  • Last Day: Time estimates
  • Found that Windows ec2 instance were 45%
  • Cost saving alone of moving from Windows to Linux justied the project
  • Pitching:
    • Demo
    • Detailed time estimates
    • Proposal with multiple options
    • Concrete benifits, cost savings, problems with rusty old infra
  • Microsoft Portability Analyzer
    • Just run across app and gives very detailed output
  • icanhasdot.net
    • Good for external dependencies

Web Hosting differences

  • OWIN Hosting vs Kestrel
  • ASP.NET Core DI

Libraries that Do support .NET Standard

  • Had to upgrade all our code to support the new versions
  • Major changes in places

OS Differences

  • case-sensitive filenames
  • Windows services, event logging

Libararies that did not support .net Standard

  • Magnum – unmaintained
  • Topshelf

.NET Framework Libraries can be run under .NET Core using compatibility shim. Sometimes works but not really a good idea. Use with extreme caution

Overall Result

  • Took 6-8 months of 2-3 people
  • Everything migrated over.
  • Around 100 services
  • 78 actually running
  • 43 really needed to be migrated
  • 31 actually needed in the end
  • Estimated old hosting cost $145k/year
  • Estimated new hosting costing $70k/year
  • Actual hosting cost $15k/year
  • Got rid of almost all the extra infrastructure that was used to support reckless. another $25k/year saved

Advice for cleanup projects

  • Ask for something small
  • Test the idea
  • Demonstrate the business case
  • Build detailed time estimates

Collecting information with care by Opel Symes

The Problem

  • People build systems for people without checking our assumptions about people are valid
  • Be aware of my assumptions, this doesn’t cover all areas

Names

  • Form “First Name” and “Last Name” -> “Dear John Smith”
  • Fields Required – should be optional
  • Should not do character checks ( blocking accents etc )
  • Check production support emoji.. everywhere
  • MySQL Character Encodings. Only since 5.5 , default in MySQL 8
  • Every Database, table and text cloumn and defaults need to be changed to the new character set. Set connection options so things don’t get lost in transfer.
  • Personal Names around the world
  • Chinese names
  • Names can be long
  • Recommendation
    • Ask for “Full name” (where a legal name is required) and “Greeting”
    • Unicode all the way down – test with emoji
    • No Length limits

Email

  • Email addresses are quite complex
  • Does it have an “@”
  • Checked it is not a simple typo of a well-known email down
  • Will it be accepted by the email sender?
  • Look for an MX record
  • Ask the SMTP server if this username is valid
  • Simple checks for common errors
  • Don’t roll your own checking, use you own mail server or the mail library that you will using to send.

Gender

  • Transgender vs Cisgender
  • Non-binary – Gender that isn’t male or female
  • Don’t just give the two options
  • A 3rd “other” option isn’t ideal
  • A freeform field is good.
  • Gender Alternative from Nikki Stevens
  • Instead ask if people make up an “under representated community”

Pronouns

  • What pronounces should we use to refer to you? ( he , she, they )
  • Works okay in English but may not in other languages
  • Some lanugages lack gender-nutral pronoun
  • Some languages lack gender pronouns
  • pronoun.is

Titles

  • Ask for “None” but don’t actually print it “Dear None Smith”
  • Ask for Mx
  • Have a freeform field ( Dr, Count )
  • Maybe avoid titles if possible
  • Don’t show people according to gender, ask specifically.

Gender – WGEA

  • The Act defines gender as male or female.
  • Others are not reported.
  • Have an explanation for people who don’t fit in the above

Data Retention

  • Make it simple to change
  • Give users options if it isn’t (eg show preferred name)

Changing Username

  • Usernames are often options
  • Changing them comes with some caveats
  • Using UUIDS to links to users rather than usernames

Changing Emails

  • There are security implications

Deleting Data

  • Make it possible and no to hard

Share

Sharing your WiFi connection with a NetworkManager hotspot

In-flight and hotel WiFi can be quite expensive and often insist on charging users extra to connect multiple devices. In order to avoid that, it's possible to easily create a WiFi hotspot using NetworkManager and a external USB WiFi adapter.

Creating the hotspot

The main trick is to right-click on the NetworkManager icon in the status bar and select "Edit Connections..." (not "Create New WiFi Network..." despite the promising name).

From there click the "+" button in the lower right then "WiFi" as the Connection Type. I like to use the computer name as the "Connection name".

In the WiFi tab, set the following:

  • SSID: machinename_nomap
  • Mode: hotspot
  • Device: (the device name of the USB WiFi adapter)

The _nomap suffix is there to opt out of the Google and Mozilla location services which could allow anybody to lookup sightings of your device around the World.

In the WiFi Security tab:

  • Security: WPA & WPA2 Personal
  • Password: (a 63-character random password generated using pwgen -s 63)

While you may think that such a long password is inconvenient, it's now possible to add the network automatically by simply scanning a QR code on your phone.

In the IPv4 Settings tab:

  • Method: Shared to other computers

Finally, in the IPv6 Settings tab:

  • Method: Ignore

I ended up with the following config in /etc/NetworkManager/system-connections/machinename:

[connection]
id=machinename
uuid=<long UUID string>
type=wifi
interface-name=wl...
permissions=
timestamp=1578533792

[wifi]
mac-address=<MAC>
mac-address-blacklist=
mode=ap
seen-bssids=<BSSID>
ssid=machinename_nomap

[wifi-security]
key-mgmt=wpa-psk
psk=<63-character password>

[ipv4]
dns-search=
method=shared

[ipv6]
addr-gen-mode=stable-privacy
dns-search=
ip6-privacy=0
method=ignore

Firewall rules

In order for the packets to flow correctly, I opened up the following ports on my machine's local firewall:

-A INPUT -s 10.42.0.0/24 -j ACCEPT
-A FORWARD -d 10.42.0.0/24 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -s 10.42.0.0/24 -j ACCEPT
-A INPUT -d 224.0.0.251 -s 10.42.0.1 -j ACCEPT
-A INPUT -d 239.255.255.250 -s 10.42.0.1 -j ACCEPT
-A INPUT -d 10.42.0.255 -s 10.42.0.1 -j ACCEPT
-A INPUT -d 10.42.0.1 -s 10.42.0.0/24 -j ACCEPT

Linux.conf.au 2020 – Thursday – Keynote – Vanessa Teague

Keynote: Who cares about Democracy? by Vanessa Teague

The techniques for varifying electronic elections are probably to difficult for real voters to use.

The ones that have been deployed have lots of problems

Complex maths for end-to-end varifiable elections
– people can query their votes to varify it was recorded
– votes are safely mixed so others can’t check.

Swisspost/Scytl System
– 2 bugs. One in the shuffling, one in decryption proof

End-to-end verifiable elections: limitations and criticism

  • Users need to do a lot of careful work to verify
  • If you don’t do it properly you can be tricked
  • You can ( usually ) prove how you voted
    • Though not always, and usually not in a polling-place system
  • Verification requires expertise
  • Subtle bugs can undermine security properties

What does all this have to do with NSW iVote?

  • Used Closed source software
  • Some software available under NDA afterwards
  • Admitted it was affected by the first Swiss bug. This was when early voting was occuring
  • Also so said 2nd Swiss bug wasn’t relevant.
  • After code was available they found it was relevant, a patch had been applied but it didn’t fix the problem
  • NSW law for election software is all about penalties for releasing information on problems.

Victoria has passed a bill that allows elections to be conducted via any method which is aimed at introducing electronic voting in future elections

Electronic Counting of Paper Records

  • Keynote: Who cares about Democracy? by Vanessa TeagueVarious areas have auditing software that runs against votes
  • This only works on FPTP elections, not Instant-runoff elelctions
  • Created some auditing software what should work, this was testing using some votes in San Francisco elections
  • A sample of ballots is taken and the physical ballot should match what the electronic one said it is.

Australian Senate vote

  • Auditing not done, since not mandated in law

What can we do

  • Swiss has laws around transparency, privacy and varivication
  • NSW Internet voting laws is orientated around protecting the vendors by keeping the code secret
  • California has laws about Auditing
  • Australian Senate scrutineering rules say nothing about computerised scanning and auting
  • Aus Should
    • Must be a meaningful statistical audit of the paper ballots
    • with meaningful observation by scrutineers

In Summary

  • Varifiable e-voting at polling place is feasible
  • over the Internet is an unsolved problem
  • The Senate count at present provides no evidence of accuracy
  • but would if a rigorous statistical audit is mandated

How else to use verifiable voting technology?

  • Crowsourcing amendments to legislation with a chance to vote up or down
  • Open input into parliamentary quesions
  • A version for teenagers to practice debating what they choose

Share

January 15, 2020

Digital excellence in Ballarat

In December I had the opportunity to work with Matthew Swards and the Business Improvements team in the Ballarat Council to provide a little support for their ambitious digital and data program. The Ballarat Council developed the Ballarat Digital Services Strategy a couple of years ago, which is excellent and sets a strong direction for human centred, integrated, inclusive and data driven government services. Councils face all the same challenges that I’ve found in Federal and State Governments, so many of the same strategies apply, but it was a true delight to see some of the exceptional work happening in data and digital in Ballarat.

The Ballarat Digital Services Strategy has a clear intent which I found to be a great foundation for program planning and balancing short term delivery with long term sustainable architecture and system responsiveness to change:

  1. Develop online services that are citizen centric and integrated from the user’s perspective;
  2. Ensure where possible citizens and businesses are not left behind by a lack of digital capability;
  3. Harness technology to enhance and support innovation within council business units;
  4. Design systems, solutions and data repositories strategically but deploy them tactically;
  5. Create and articulate clear purpose by aligning projects and priorities with council’s priorities;
  6. Achieve best value for ratepayers by focusing on cost efficiency and cost transparency;
  7. Build, lead and leverage community partnerships in order to achieve better outcomes; and
  8. Re-use resources, data and systems in order to reduce overall costs and implementation times.

The Business Improvement team has been working across Council to try to meet these goals, and there has been great progress on several fronts from several different parts of the Council.  I only had a few days but got to see great work on opening more Council data, improving Council data quality, bringing more user centred approaches to service design and delivery, exploration of emerging technologies (including IoT) for Council services, and helping bring a user-centred, multi-discplinary and agile approach to service design and delivery, working closely with business and IT teams. It was particularly great to see cross Council groups around big ticket programs to draw on expertise and capabilities across the organisation, as this kind of horizontal governance is critical for holistic and coordinated efforts for big community outcomes.

Whilst in town, Matthew Swards and I wandered the 5 minutes walk to the tech precinct to catch up with George Fong, who gave us a quick tour, including to the local Tech School, as well as a great chat about digital strategies, connectivity, access, inclusiveness and foundations for regional and remote communities to engage in the digital economy. The local talent and innovation in Ballarat is great to see, and in such close vicinity to the Council itself! The opportunities for collaboration are many and it was great to see cross sector discussions about what is good for the future of Ballarat :)

The Tech School blew my mind! It is a great State Government initiative to have a shared technology centre for all the local schools to use, and included state of the art gaming, 3D digital and printing tech, a robotics lab, and even an industrial strength food lab! I told a few people that people would move to Ballarat for their kids to have access to such a facility, to which I was told “this is just one of 10 across the state”.

It was great to work with the Business Improvement team and consider ways to drive the digital and data agenda for the Council and for Ballarat more broadly. It was also great to be able to leverage so many openly available government standards and design systems, such as the GDS and DTA Digital Service Standards and the NSW Design System. Open governments approaches like this make it easier for all levels of government across the world to leverage good practice, reuse standards and code, and deliver better services for the community. It was excellent timing that the Australian National API Design Standard was released this week, as it will also be of great use to Ballarat Council and all other Councils across Australia. Victoria has a special advantage as well because of the Municipal Association of Victoria (MAV), which works with and supports all Victorian Councils. The amount of great innovation and coordinated co-development around Council needs is extraordinary, and you could imagine the opportunities for better services if MAV and the Councils were to adopt a standard Digital Service Standard for Councils :)

Many thanks to Matt and the BI team at Ballarat Council, as well as those who made the time to meet and discuss all things digital and data. I hope my small contribution can help, and I’m confident that Ballarat will continue to be a shining example of digital and data excellence in government. It was truly a delight to see great work happening in yet another innovative Local Council in Australia, it certainly seems a compelling place to live :)

Linux.conf.au 2020 – Wednesday – Session 3 – FLOSS Leadership and Citizenship

Open collaborations: leadership succession and leadership success – Anne Smith & Myk Dowling

  • Started playing Kerbal Space Program and using lots of mods to it.

KSP-CKAN

  • Comprehensive Kerbal Archive Network
  • 150k downloads of a previous release, 72k of last release
  • 1035 starts on github
  • 124 releases from 16 developers
  • Written in C-sharp

Why was the project a success out of around 1.4 million projects?

Conway’s Law

  • FOSS projects are generally modular
    • C and C-derived languages are predictive of success
    • Portability predictor of success
  • Layered Development

83% of FOSS Projects fail. 46% before and 37% after a stable release

How do projects organise?

  • First the founder 1-2
  • Then a belt of users emerages
  • Then a periphery – active users
  • A core of developer emerges
  • Some formality emerges

Onboarding People

  • Relying on self-motivated people limits the number of people who will join your team
  • If you lose people by brushing them off you reduce your team diversity, team diversity gives increased likelihood of success
  • From the core to the periphery. Order of magnitude decrease in activeity but order of magnitude increase in size.
  • Therefore is 1:1 level or work. Which is about the same level of code:support work that is needed.
  • Flat structures are not stable; FOSS teams self-organise into a complex of a dual-layer structure
  • Leaders should prioritise the people on the periphery. Many join for a short term need, the leader has to give them other reasons to stick around.

Links to other Projects

  • Friction with Mod authors. Mods who though CKAN installed things the wrong way and caused problems got annoyed.
  • Some authors of modules that were under FOSS asked for it to be removed, which CKAN resisted doing.
  • CKAN was mostly orientated towards users and not so much towards the authors
  • Significant group of mod authors considered opting out of CKAM
  • Speaker proposed a policy that allowed mod authors to delete mod

Leadership

  • Strong technical contributions
  • Participatory behavior
  • Organisation building behaviors

Leadership origin and style

  • Typically the initial leader/s are the founder/s
  • Often shared
  • Leaders may move from core to periphery without losing the position
  • Organisation focus vs Product (technical) focus
  • People with both skills are the ones selected for leadership

CKAN in Transition

  • Removed mods as requested
  • Which broke things for some time
  • Leadership got transfered over
  • Original technical-orientated leader stepped back
  • A more Organizational-orientated leader took over
  • A clear and public succession is much better. Although some people still dropped out.
  • But better and an acrimonious fork

Leadership Transitions

  • Make speed and smooth
  • Happen at the speed of military coups
  • Limited participation from a predecessor assits in a smooth change
  • Establishing succession rules helps

Review the state of your projects public-facing website from the POV of the peripheral people you want to attract.

Open Source Citizenship by Josh Simmons

Healthy Projects are vital which is why many companies are investing in projects

  • They don’t just need money

What are companies doing now?

  • Upsteaming contributions
  • Contributing to the ecosystem
  • Paid contributors on staff (full or part time)
    • Hire out of the Project contributors
  • Supporting with money, infrastructure etc. Both projects directly and other things
  • Programs to help contributors get started.
  • Sharing their experience

What companies provide is not always what communities want

What are Communities asking for?

  • Volunteer design,
  • UX/UI
  • Project management
  • technical writing
  • data science
  • marketing/PR

and yet, code still dominates. These skills need onramps to contribute to your projects.

  • Contribute beyond what the company needs
  • Projects want testing and QA resources
  • Fund conference travel for contributors
  • Event Space
  • Open Source friendly contracts for employees who contribute to Open Source – See the “Contract Patch Program”
  • Jobs the maintainers and contributors when heavily relying on their work
    • If the maintainers are not getting paid that is a risk for the business
  • Encourage Universities to give students credit for contributing to FLOSS
  • Abide by community norms

Building a Culture of Open Source Citizenship

  • Enumerate and value your dependencies
  • Raise internal aweness
  • Incentivise your people to contribute to open source
  • train, train and train
  • Be Patient

For FLOSS Projects

  • Make it easy to learn about you project
  • Have clear project government and licensing
    • Say what you are looking for
    • We want to know the invest we make in you is going to be used well and in a trasparant way
    • Have a way to receive Money
    • Look at being a member of a larger organisation like Software Conservatory
    • See also open collective if you are just starting out
  • Have a plan for how you are going to use the money
  • Be prepared to work with corporate timelines
  • Be prepared to onboard new contributors
    • Contributor documentation

Share

Linux.conf.au 2020 – Wednesday – Session 2 – Unix Legacy / Social Media Research

What UNIX Cost Us by Benno Rice

Not everything is a file

Connecting to a USB device:

  • Windows – not too bad
  • Mac – a little weird
  • Linux – Lots of weird file operations. ioctl to pass data back and forth

Even worse API for creating usb_fs device. Lots of writing random data to random files.

But this is all behind a nice library?

  • Yeah but it is still a mess under the hood

Got a Byte? – Unix IO model

  • Works okay on small slow machines with simple slow interfaces
  • Doesn’t work so well with Internet, blocking
  • poll still has performance limitations
  • kevent api looked nice but Linux got epoll instead (but focuses around file descriptors)
  • But they are all still synchronous
  • Windows has Async calls

Unix is Tied to it’s history

  • Windows is newer so could learn from what came before and targetted newer hardware

C is for Colonialism

  • Farming in Europe
    • Moved to Australia, everything they new about farming doesn’t work any more.
  • PDP 11 was what Unix originally was one, simple process model.
  • Modern CPUs are not very simple
  • New CPUs lie to the OS about what the state of the machine really is (see Spectre).
  • C is not built to handle this.
  • C doesn’t handle
    • Vectorisation
    • Structure layout and padding
    • Arrays, pointers etc
  • We are not on a PDP-11 anymore
  • We have failed to evolve out CPUs and C because they are locked to each other
  • “C is not a Low Level Language” – Article

The UNIX Philosophy Problem

  • Lots of different definitions
  • Pipes seem important
  • Everything I like about using computers these days tends to be big integrated desktop tools.

Unix Suited it’s time

  • By accident it became the thing we all use
  • That time was a long time ago

How we run the community has also evolved.

Privacy is not Binary: A discussion of data systems, ethics, and human rights by Elizabeth Alpert and Amelia Radke

I was a little late to this talk so missed out the first 10-15 minutes

Social Media data reuse

  • Used by the providers
  • Governments
  • Other users
  • Malicious Users

Chucking lots of data into an “AI” is seen as yelding interesting and cool data.

Within Academia

  • Risk management. Aware of harms, mitigated, risk/reward

Is Social Media data public or private?

  • It was shared with the expectation of a certain context
  • Had to write things your friends but keep random 3rd parities in mind
  • Inferring personal information -> Dangerous
    • Especially when you are trying to infer “protected” characteristics like sexuality or religion
  • Consent? – Tricky
  • Anonymizable? – Doesn’t work

Perceptions of Risks

  • At risk groups usually given higher protection
  • Privacy is cultural concept
  • Cultural Maps

How do we do things better

  • Ethics can’t be just one person’s responsibility, it has to be in all decisions
  • Who does this belong to?
  • How do they want it to be shared?

Share

Linux.conf.au 2020 – Wednesday – Session 1 – K8s & Security Advice

Building a zero downtime Kubernetes cluster by Feilong Wang

Working for Catalyst Cloud. Catalyst Cloud especially appealing to NZ customers who don’t want latency of going to Australia

Zero Downtime in K8s Context
– Downtime of the User applications
– Downtime of the k8s cluster

The ultimate goal is zero downtime for the customer applications.

User Applications

  • Replicas >2 (ideally >3)
  • podDisruptionbudget with minAvailbale
  • Correct RollingUpdate strategy
  • Connection Draining (using readynessProbe, handle SIGTERM)
    • use prestop for apps that don’t handle sigterm
  • HTTP Keep-Alive

Zero Downtime for the K8s Cluster

  • Planned maintenance (eg an upgrade)
  • Unexpected node broken

Planned

  • Cordon and drain nodes, upgrade, uncordon

Unplanned Node Broken

  • Failure detection
  • Repair/Healing
  • Manual or Automatic?

Detect Failure

  • Detect failures from outside or inside the cluster

Draino + Cluster Autoscaler

  • Detect node status/condition by draino
  • Draino the node
  • Autoscaler will remove the empty node since it’s workload is under 50%
  • See also Node Problem Detector

Magnum AutoHealer

  • Support master node and etcd repairing
  • Autoscaler is responsible for repairing
  • The node count is predictable after repairing
  • Currently only supports openstack but could be extended

Like, Share and Subscribe: Effective Communication of Security Advice by Serena Chen

Tools and ideas to help you communicate security advice to friends and family who are not in tech.

Security Professionals are a bubble within the Tech Bubble.

Tell the people who are doing the wrong practises (like using Windows XP) that “we can’t help you”.

Nobody chooses to do the wrong thing and be insecure, they are trying to do the best for themselves.

What if people are not bad at security “because it is hard” but because they are not getting the right messages.

Personas

  • Group 1
    • Don’t know what good practice looks like
    • Confused what to do
  • Group 2
    • Knows some good practises
    • But doesn’t do any of them (eg knows about password managers but doesn’t use them)
    • Not sure how to impliment

Security is lot exercise

  • Ongoing
  • More is better
  • Room for improvement
    • Little steps, not big steps
    • Do one update not a huge change
    • The Perfect is the enemy of the good
    • Personalised for each person

How to Personalise for each person

Consider where on the following spectrums they fall

  • Technological Capability
  • Privacy needs
    • Don’t forget those who need to be visable
  • Likely Adversaries

The Open Internet tools Project have a big sample of personas

Lay a Path for Progression

  • Couch to 5k for Security
  • Week 1 – Add a password on your phone
  • Week 2 – Change you email password

How do we communicate

  • Tell, sell and shame doesnt work
  • Lead by example (with is what I do, you could too)
  • Sell doesn’t work
    • Give people successful examples to emulate
    • Give peopel scripts to help them navigate
  • Shame also doesn’t work
    • Shame Culture means that people don’t ask for advice
    • Try asking “Hey, can I show you a better way to do this? “

“Influencers”

  • Show don’t tell
  • Show their mistakes
  • Let you opt in and not out
  • Give you a range of people to follow
  • I made a youtube channel!
    • Immediately fell back into the habit of Tell, Sell and Shame
    • To reach people requires a degree of vulnerbility
    • Experts are the ones who don’t want to reveal their personal security setup
  • What else happened
    • Friends asked me about my security
    • Showed people in IRL my personal setup and how I got there
      • Honest about how hard it was
      • A lot of them were already clued up, seeing somebody they know actually doing it encouraged them to take the step and do it

Be Vulnerable

  • Tell them how you screwed up
  • People want to hear how they are not stupid for finding it hard
  • Be nice to people

Share

Linux.conf.au 2020 – Wednesday – Keynote – Donna Benjamin

Keynote: From 2020 to 2121: How will we get there?

Who is watching and why are they watching,? Why does it matter?

People install siri and other personal assistants. Cameras are everywhere.

We are making it too easy for the bad guys.

But makers of free and open source software and also helping the persuasion industry. Are we responsible for that?

The Why matters. Is the tech deterring crime, helping rescue people or used for repressing people.

Observation + Suspicious = Surveillance

From here to 2021

How to make the future happen. Act now to create what you’ll need when you get there. Pack like a Hiking trip

The Four Powers – Information, Relationships, Resources and Decision Making

What is something small can you do now to make the future better? Donna is going to take steps to improve our herd immunity to mass surveillance

https://etherpad.wikimedia.org/p/LCA2121

Step to take to more evenly distribute power now and more evenly distribute the future in 2021

Open Australia – Run various websites
– Putting Hansard online in machine readable format
– More easily submit freedom of information request

Appreciative Inquiry

Share

January 14, 2020

Linux.conf.au 2020 – Tuesday – Session 3 – Container Miniconf

Unsafe Defaults: Deploying Kubernetes Safer(ish) – James

Overview of Kubernetes

  • A compromised container is very close to being a compromised host
  • While you shouldn’t curl|bash the attacker can do it to get the latest exploits.

Three Quick things for some easy wins

  • The Kubernetes API is completely open from localhost. This is no longer required but old clusters and some upgraded clusters may still have it.
  • Put a Valid certificate on the cluster or at least one you can keep track of.
  • Get rid of unauthenticated user roles as much as possible.
  • Check you don’t still have “forever tokens”
  • A Good idea not to give service tokens to most pods.
    automountServiceAccountToken: false

PodsecurityPolicy

  • Keep an eye on
  • New
  • You need good RBAC
  • Have a look at k-rail

etcd

  • Can turn on authentication
  • Can turn on TLS between peers and clients
  • Can encrypt on disk
  • Can restrict it with a firewall

Every Image Has A Purpose by Allen Shone

Docker Images

  • What are they anyway
  • A base definition to prepare a filesystem for execution as a container
  • Caching mechanism
  • Reproduceable
  • Great way to share runtime circumstances
  • A comprehensive environment structure

Layers

  • image is a series of layers
  • Minimizing layers makes things better
  • Structure the image build process to get the best set of images

Basic Uses

  • Use the most appropriate image
  • A small fix can add up

Images in Production / Customers facing envs

  • When deploying containers, be precise as possible.
  • The image should be ready to go without further work
  • Keep image and small and simple as possible
  • “FROM: golang:alpine” in testing
  • “FROM: scratch” in production
  • Two images but they serve different purposes

Development

  • Possible to use the same image as previously
  • Bring in some extra debug tools etc, mocks for other services

Trimming the final image to be very specific

  • Start with the production image and add extra layers of stuff

Deployed Considerations

  • Some things only come into consideration once they are deployed
  • Instead of creating a big general container, create two containers in a pod that share a file system
  • Configuration should be injeted, as an env-specific setup
  • Images should be agnostic

Extras

  • Look at using the .dockerignore file
  • Use image scannign tools ( Diive and Clair)
  • A little preparation up front can prevent a lot of headache later

Share

Writing a terraform remote state server

Share

Terraform is a useful tool for deploying cloud resources. This post isn’t an introduction to terraform, so I’ll assume you already know and love it. If you want more, then this getting started guide would be a sensible start.

At its most basic level, terraform deploys cloud resources and stores information about those resources in a file on local disk called terraform.tfstate — it needs that state information so it can make later changes to the deployment, be those modifying resources in use or tearing the whole deployment down. If you had an operations team working on an environment, then you could store the tfstate file in git or a shared filesystem so that the entire team could manage the deployment. However, there is nothing with that approach that stops two members of the team making overlapping changes.

That’s where terraform state servers come in. State servers can implement optional locking, which stops overlapping operations from happening. The protocol that these servers talk isn’t well documented (that I could find), so I wanted to explore that. I wanted to explore that more, so I wrote a simple terraform HTTP state server in python.

To use this state server, configure your terraform file as per demo.tf. The important bits are:

terraform {
  backend "http" {
    address = "http://localhost:5000/terraform_state/4cdd0c76-d78b-11e9-9bea-db9cd8374f3a"
    lock_address = "http://localhost:5000/terraform_lock/4cdd0c76-d78b-11e9-9bea-db9cd8374f3a"
    lock_method = "PUT"
    unlock_address = "http://localhost:5000/terraform_lock/4cdd0c76-d78b-11e9-9bea-db9cd8374f3a"
    unlock_method = "DELETE"
  }
}

Where the URL to the state server will obviously change. The UUID in the URL (4cdd0c76-d78b-11e9-9bea-db9cd8374f3a in this case) is an example of an external ID you might use to correlate the terraform state with the system that requested it be built. It doesn’t have to be a UUID, it can be any string.

I am using PUT and DELETE for locks due to limitations in the HTTP verbs that the python flask framework exposes. You might be able to get away with the defaults in other languages or frameworks.

To run the python server, make a venv, install the dependancies, and then run:

$ python3 -m venv ~/virtualenvs/remote_state
$ . ~/virtualenvs/remote_state/bin/activate
$ pip install -U -r requirements.txt
$ python stateserver.py

I hope someone else finds this useful.

Share

Linux.conf.au 2020 – Tuesday – Session 2 – Security, Identity, Privacy Miniconf

Privacy and Transparency in the VPN industry by Ruben Rubio Rey

We are at an “Oh Noe!” Moment in the VPN Industry

VPN Advantages

  • Protect your privacy
  • Bypass Geo-Restrictions
  • Beat Censorship
  • Save money on Hotels and Flights
  • Download torrents anonymously
  • Bypass ISP speed regulations
  • Secures Public WIFI

What Can be intercepted?
– Without Encryption: Any Data
– With Encryption: IP and Port

But HTTPS only works of client and server configured correctly
Client: Rough root certificate
Servers: CORS, insecure SSL version

Protect Your Privacy

  • Many Countries Systematicly collecting data about citizens
  • ISP collect data, must keep for two years and accessabil to agencies
  • USA ISP’s can sell information
  • Others Countries tried to put in MITM Certs

So Private companies have incentives to protect my data?

The Reality of Private VPN providers

  • Several examples of collecting Data
  • Several examples of them releasing data to agencies
  • Random security and implementation problems
  • Exaggerations in sales pitches
  • Installs Rouge Roots Cert on user machine

Conflict of Interest, what is a business model of the providers?

Stats

  • 59% of Free VPNs in play store had hidden Chinese ownership
  • 86% had privacy policy flaw
  • 85% asked for excessive permissions

Are VPN Companies Needed?

People with non-technical skills need an option

How to Improve the VPN Market?

  • Privacy and Transparency go hand and hand
  • Open Source Provides Transparency
  • End to End open source VPN Company
  • theVPNcompany.com.au

Install you own VPN

Algo and Streisand

Create your own VPN Company using the base for “The VPN Company”

https://thevpncompany.com.au/

Authentication Afterlife: the dark side of making lost password recovery harder by Ewen McNeill

Twitter Account “badthingsdaily” . Fictional Scenarios that might happen to security people. Inspired this talk.

Scenario 1

  • A Big fire took out your main computer
  • You done have the computer and you don’t know all your passwords

Recovery Traditional

  • You get email somewhere else. On your phone
  • Click on Forgot my password
  • Repeat until all accoutns recoveryed

Scenario 2

  • You need to login to your account on a new device
  • All account secured with 2FA
  • Your 2FA isn’t working

Recovery

  • Recovery Tokens
  • Alternative 2FA Solution

Scenario 3

  • Your bad was stolen
  • It had computer, phone and 2FA
  • Can bad guy impersonate you?
  • Can you recovery faster than the other guy (or at all?)

Recovery

  • Does you 2FA pop up on your lock screen?
  • So anybody with your computer is able to get this?
  • Race to reset passwords and invalidate your login tokens
  • Maybe you remember your passwords but not you 2FA
  • Recovery questions “Mother’s maiden name”
  • Can be easy to discover, but if it is something random then you have to be able to find it (ie on the password store you just lost)

Multiple alternate authentication methods

  • Primary you use every day
  • One or more backups

If resetting your password every time is easier than remembering your password people will do that.

Attackers will use the easiest authentication method. Eg Contacting the Helpdesk or going into a bank branch office.

But if recovery is too hard you can end up losing access to your account permanently

Recommend: GitHub’s 2FA recovery guide

Scenario 4

You startups founder has left. He has wipped out all his computer. Now your Cloudprovider is threatening to lock you out unless you authenticate using 2FA

  • Hopefully in the password store
  • Or perhaps they no longer work
  • Contact Helpdesk, Account Manager, Lawyer, Social Media (usually the bigger you are and the more you pay the better you chance)
  • Sore everything centrally. How do you audit that? , regularly?

Scenario 5

A relative dies. You first step is to login to all their accounts work out what should be kept.

This will take months not years. Sometimes you will only find out the account exists when they email you that your account is about to expire.

Personal Observations

  • You will not have access to their cellphone
  • or probably not past the lock screen
  • Anything they told you that was obvious you will forget
  • You will not have access to the password store
  • You may have access to saved passwords in browser
  • Maybe you need to optimise for family can access stuff not complete lockdown.
  • Physical notebook with passwords
  • Consider in advance how you will recover if your 2FA device breaks
  • How will you convince a helpdesk person that you are you?

Personal Mitigations

  • Kawaiicon 2019 ” How can I help you” Talk by Laura Bell

You Shall Not Pass by Peter Burnett

Moodle is an open souce Learning Management System.

  • Legacy System
  • First developed in 1997
  • Open Sourced in 2001
  • New Code is good quality, older stuff not as much

Efforts to improve password policy

  • Password policy was a bit antiquated
  • Best policies come from NIST, 2018 version is good.
  • Don’t force a pattern, Check for compromised passwords, Check for dictionary based and identifying passwords
  • Look at the “Have I been Pwned” API – takes first 5 characters of the sha of the password.
  • Dictionary checks – Top 10,000 English words might be enough
  • Indentifying information – Birthdays, names, cities are things to watch for. Name of the company.

Released as an open source plugin for Moodle

A look at the Authentication Flow

  • Natively supported LDAP etc.
  • Lots of extra plugins impliment other methods
  • Had to put MFA in when people using plugins. Difficult to mix
  • Added extra hook on “account related” actions, they would check for MFA etc.
  • Required a bit of work to get merged in.

Implementing MFA

  • MFA is a superset of 2FA implimentations
  • Had to do extensible platform
  • Traditional: TOTP, Email
  • Non-Traditional: IP verification, Authentication type (might already have MFA)
  • Design considerations – Keep secure but impact people as little as possible.
  • Different users: Not required, Optional, Forced Upon . So built in the ability for a range of use across platform.
  • Learnings
    • Anything can be used as a factor
    • delicate balance between secure and usable
    • When designing, paranoid is the right mindset
    • Give the least information possible to allow a legit user to authenticate
    • What can the attacker do if this factor is compromised?

Final Thoughts

  • Long way to go
  • Security is a shifting goalpost
  • Keep on top of new developments

Share

Linux.conf.au 2020 – Tuesday – Session 1 – Security, Identity, Privacy Miniconf

Facebook, Dynamite, Uber, Bombs, and You – Lana Brindley

  • Herman Hollerith
    • Created the punch card, introduced for the 1890 US Census
    • Hollerith leased companies to other people
  • Hollerith machines and infrastructure used by many Census in Europe.
    • Countries with better census infrastructure using Hollerith machines tended to use have higher deather rate in The Holocaust
  • Alfred Nobel
    • Invented Dynamite and ran weapons company
  • Otto Hahn
    • Invented Nuclear Fission
  • Eugenics
    • 33 US states have sterilization programmes in place
    • 65,000 Americans sterilized as part of programmes
    • WHO was created as a result.
  • Thalidomide
    • Over-the-counter morning sickness treatment
    • Caused birth defeats
    • FDA strengthened

Unintended consequences of technology, result was stronger regulation

Volkswagen emission and Uber created Greyball
– Volkswagen engineers went to jail, Uber engineers didn’t

Here are some IT innovations that didn’t lead to real change

  • Medical Devices
    • Therac-25 was a 1980s machine used for treating cancer with radiation
    • Control software had race condition that gave people huge radiation overloads
  • Drive by Wire for Cars
    • Luxus ES350 sudden acceleration
    • Toyota replaced floor mats, not software
    • Car accelerator stuck at full speed and brakes not working
    • No single cause ever identified
  • Deep Fake Videos
  • Killer Robots
    • South Korean Universities came under pressure to stop research, said they had stopped but not confirmed.
  • Chinese Surveillance
    • Checkpoints all though the city, average citizen goes though them many times per day and have phoned scanned, other checks.
    • Cameras with facial recognition everywhere
  • Western Surveillance – Palantir and other companies installing elsewhere
  • Boeing Software – 373 Max

Bad technology should have consequences and until it does people have to avoid things themselves as much as possible and put pressure on governments and companies

The Internet: Protecting Our Democratic Lifeline by Brett Sheffield

Lost of ways technology can protect us (Tor etc) and at the same time plenty of ways technology works against our prevacy.

The UN Declaration of Human Rights
Australia is the only major country without a bill of rights.

Ways to contribute
– They Work for you type websites
– Protesting
– Whistleblowers

Democracy Under Threat
– Governments blocking the Internet
– Netblocks.org
– Police harrass journalists (AFC raids ABC in Aus)
– Censorship

Large Companies
– Gather huge amounts of information
– Aim for personalisation and monotisation
– Leads to centralisation

Rebuilding the Internet with Multicast
– Scalable
– Happens at the network layer
– Needs to be enabled on all routers in each hop
– Currently off by default

Libracast
– Aims to get multicast in the hands of developers
– Tunnels though non-multicast enabled devices
– Messaging Library
– Transitional tunneling
– Improved routing protocol
– Try to enable in other FOSS projects
– Ensure new standards ( WebRTC, QUIC) support multicast



Share

Linux.conf.au 2020 – Tuesday – KeyNote: Sean Brady

Keynote: Drop Your Tools – Does Expertise have a Dark Side? by Dr Sean Brady

Harford Convention Center

Engineers ignored warnings of problems, kept saying calculations were good. Structure collasped under light snow load

People are involved with engineering, therefore it is a people problem

What it possessing expertise has a dark side? Danger isn’t ignnorance it is the illusion of knowledge.

Mann Gulch fire

Why did the firefighters not drop their tools?
Why did they not get in the Escape Fire?

Priming – You get information that primes you to think a certain way.

What if Expertise priming somebody?
– Baseball experts primed to go down the wrong path, couldn’t even stop when explicitly told about the trick.

Firefighters explicitly trained that they are faster runners with tools.

Creative Desperation – Mentally drop your existing tools.



Share

January 13, 2020

Linux.conf.au 2020 – Monday – Opening

Welcome to Country from the Yugambeh people

Main Organisers

Ben Stevens
Joel Addison
  • Sponsors Acknowledged
  • Scheduled Outlined
  • Review of location, food, swag and other stuff
  • Charity for raffle is buchfire appeal

Share

January 06, 2020

Setting up VXLAN between nested virt VMs on Google Compute Engine

Share

I wanted to play with a VXLAN mesh between VMs on more than one hypervisor node, but the setup for VXLAN ended up being a separate post because it was a bit long. Read that post first if you want to follow the instructions here.

Now that we have a working VXLAN mesh between our two nodes we can move on to installing libvirt (which is called libvirt-daemon-system on Debian, not libvirt-bin as on Ubuntu):

sudo apt-get install -y qemu-kvm libvirt-daemon-system
sudo virsh net-start default
sudo virsh net-autostart --network default

I’m going to use a little python helper to launch my VMs, so I need some other dependancies as well:

sudo apt-get install -y python3-pip pkg-config libvirt-dev git

git clone https://github.com/mikalstill/shakenfist
cd shakenfist
git checkout 6bfac153d249752b27d224ad9d079095b640498e

sudo mkdir /srv/shakenfist
sudo cp template.debian.xml /srv/shakenfist/template.xml
sudo pip3 install -r requirements.txt

Let’s launch a quick test VM to make sure the helper works:

sudo python3 daemon.py
sudo virsh list

You can destroy that VM for now, it was just testing the install.

sudo virsh destroy ...name...

Next we need to tweak the template that shakenfist is using to start instances so that it uses the bridge for networking (that template is the one you copied to /srv/shakenfist/template.xml earlier). Replace the interface section in the template with this on both nodes:

<interface type='bridge'>
  <mac address={{eth0_mac}}/>
  <source bridge='br-vxlan0'/>
  <model type='virtio'/>
  <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
</interface>

I know the bridge mentioned here doesn’t exist yet, but we’ll deal with that in a second. Before we start VMs though, we need a way of getting IP addresses to them. shakenfist can configure interfaces using config drive, but I’d prefer to use DHCP because who doesn’t love some additional complexity?

On one of the nodes install docker:


sudo apt-get install apt-transport-https ca-certificates curl gnupg2 software-properties-common
curl -fsSL https://download.docker.com/linux/debian/gpg | sudo apt-key add -
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/debian $(lsb_release -cs) stable"
sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.io

Now we can setup DHCP. Create a place for the configuration file:

sudo mkdir /srv/shakenfist/dhcp

And then create the configuration file at /srv/shakenfist/dhcp/dhcpd.conf with contents like this:

default-lease-time 3600;
max-lease-time 7200;
option domain-name-servers 8.8.8.8;
authoritative;

subnet 192.168.200.0 netmask 255.255.255.0 {
  option routers 192.168.1.1;
  option broadcast-address 192.168.1.255;

  pool {
    range 192.168.200.10 192.168.200.254;
  }
}

Before we can start dhcpd, we need to move the VXLAN device into a bridge so we can add a device for the DHCP server to it. First off remove the vxlan0 device from the last post:

sudo ip link set down dev vxlan0
sudo ip link del vxlan0

And now recreate it with a bridge:

sudo ip link add vxlan0 type vxlan id 42 dev eth0 dstport 0
sudo bridge fdb append to 00:00:00:00:00:00 dst 34.70.161.180 dev vxlan0
sudo ip link add br-vxlan0 type bridge
sudo ip link set vxlan0 master br-vxlan0
sudo ip link set vxlan0 up
sudo ip link set br-vxlan0 up
sudo ip link add dhcp-vxlan0 type veth peer name dhcp-vxlan0p
sudo ip link set dhcp-vxlan0p master br-vxlan0
sudo ip link set dhcp-vxlan0 up
sudo ip link set dhcp-vxlan0p up
sudo ip addr add 192.168.200.1/24 dev dhcp-vxlan0

This block of commands:

  • recreated the vxlan0 interface
  • added it to the mesh with the other node again
  • created a bridge named br-vxlan0
  • moved the vxlan0 interface into it
  • created a veth pair called dhcp-vxlan0 and dhcp-vlan0p
  • moved the peer part of that veth pair into the bridge
  • and then configured an IP on the external half of the veth pair

To make the bridge survive reboots you would need to add it to either /etc/network/interfaces or /etc/netplan/01-netcfg.yml depending on your distribution, but that’s outside the scope of this post.

You should be able to ping again. From the other node give it a try:

$ ping 192.168.200.1
PING 192.168.200.1 (192.168.200.1) 56(84) bytes of data.
64 bytes from 192.168.200.1: icmp_seq=1 ttl=64 time=19.3 ms
64 bytes from 192.168.200.1: icmp_seq=2 ttl=64 time=0.571 ms

We need to do something similar on the other node so it can run VMs as well. It is a tiny bit simpler because there wont be any DHCP there however, and remembering that you need to change 35.223.115.132 to the IP of your first node:

sudo ip link set down dev vxlan0
sudo ip link del vxlan0

sudo ip link add vxlan0 type vxlan id 42 dev eth0 dstport 0
sudo  bridge fdb append to 00:00:00:00:00:00 dst 35.223.115.132 dev vxlan0
sudo ip link add br-vxlan0 type bridge
sudo ip link set vxlan0 master br-vxlan0
sudo ip link set vxlan0 up
sudo ip link set br-vxlan0 up

Note that now we can’t do a ping test because the second VM no longer consumes an IP for the base OS.

Now we can start the docker container with dhcpd listening on dhcp-vxlan0:

sudo docker run -it --rm --init --net host -v /srv/shakenfist/dhcp:/data networkboot/dhcpd dhcp-vxlan0

This runs dhcpd interactively so we can see what happens. Now try starting a VM on the other node:

sudo python3 daemon.py

You can watch the VM booting using the “virsh console” command with the name of the vm from “virsh list“. The dhcpd process should show you something like this:

sudo docker run -it --rm --init --net host -v /srv/shakenfist/dhcp:/data networkboot/dhcpd dhcp-vxlan0
Internet Systems Consortium DHCP Server 4.3.5
Copyright 2004-2016 Internet Systems Consortium.
All rights reserved.
For info, please visit https://www.isc.org/software/dhcp/
Config file: /data/dhcpd.conf
Database file: /data/dhcpd.leases
PID file: /var/run/dhcpd.pid
Wrote 0 leases to leases file.
Listening on LPF/dhcp-vxlan0/06:ff:bc:7d:11:e3/192.168.200.0/24
Sending on   LPF/dhcp-vxlan0/06:ff:bc:7d:11:e3/192.168.200.0/24
Sending on   Socket/fallback/fallback-net
Server starting service.
DHCPDISCOVER from ee:95:4d:40:ca:a6 via dhcp-vxlan0
DHCPOFFER on 192.168.200.10 to ee:95:4d:40:ca:a6 (foo) via dhcp-vxlan0
DHCPREQUEST for 192.168.200.10 (192.168.200.1) from ee:95:4d:40:ca:a6 (foo) via dhcp-vxlan0
DHCPACK on 192.168.200.10 to ee:95:4d:40:ca:a6 (foo) via dhcp-vxlan0

You can see here that our new VM got the IP 192.168.200.10 from the DHCP server! It is moments like this when you don’t realise that this blog post took me hours to write that I feel really smart.

If we started a VM on the first node (the same command as for the second node), we’d now have two VMs on a virtual network which had working DHCP and could ping each other. I think that’s enough for one evening.

Share

January 05, 2020

Setting up VXLAN on Google Compute Engine

Share

So my ultimate goal here is to try out VXLAN between some VMs on instances in Google compute engine, but today I’m just going to get VXLAN working because that took a fair bit longer than I expected. First off, boot your instances — because I will need nested virt later I chose two instances on Google Cloud. Please note that you need to do a bit of a dance to turn on nested virt there. I also chose to use Debian for this experiment:

gcloud compute instances create vx-1 --zone us-central1-b --min-cpu-platform "Intel Haswell" --image nested-vm-image

Now do those standard things you do to all new instances:

sudo apt-get update
sudo apt-get dist-upgrade -y

Now let’s setup VXLAN between the two nodes, with a big nod to this web page. First create a VXLAN interface on each machine (if you care about the port your VXLAN traffic is on being to IANA standards, see the postscript at the end of this):

sudo ip link add vxlan0 type vxlan id 42 dev eth0 dstport 0

Now we need to put the two nodes into a mesh, where 34.70.161.180 is the IP of the node we are not running this command on and the IP address for the second command needs to be different on each machine.

sudo bridge fdb append to 00:00:00:00:00:00 dst 34.70.161.180 dev vxlan0
sudo ip addr add 192.168.200.1/24 dev vxlan0
sudo ip link set up dev vxlan0

I am pretty sure that this style of mesh (all nodes connected) wouldn’t scale past non-trivial sizes, but hey baby steps right? Finally, because we’re using Google Cloud we need to add firewall rules to allow our traffic into the instances:

Note that these rules are a source of confusion for me right now. I wanted (and configured) VXLAN. So why do I need to allow OTV for this to work? I suspect Linux has politely ignored my request and used OTV not VXLAN for my traffic.

We should now be able to ping those newly configured IP addresses from each machine:

ping 192.168.200.2 -c 1
PING 192.168.200.2 (192.168.200.2) 56(84) bytes of data.
64 bytes from 192.168.200.2: icmp_seq=1 ttl=64 time=1.76 ms

Which produces traffic like this on the underlay network:

tcpdump -n -i eth0 host 34.70.161.180
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
09:01:58.159092 IP 10.128.0.9.59341 > 34.70.161.180.8472: OTV, flags [I] (0x08), overlay 0, instance 42
IP 192.168.200.1 > 192.168.200.2: ICMP echo request, id 20119, seq 1, length 64
09:01:58.160786 IP 34.70.161.180.48471 > 10.128.0.9.8472: OTV, flags [I] (0x08), overlay 0, instance 42
IP 192.168.200.2 > 192.168.200.1: ICMP echo reply, id 20119, seq 1, length 64
^C
2 packets captured
2 packets received by filter
0 packets dropped by kernel

Hopefully this is helpful to someone else. Thanks again to Joe Julian for a very helpful post.

Postscript: Dale Shaw pointed out on twitter that I might still be talking VXLAN, just on a weird port. This is supported by this comment I found on the internets: “when VXLAN was first implemented in linux, UDP ports were not specified. Many vendors use 8472, and Linux uses the same port. Later, IANA allocated 4789 as the port. If you need to use the IANA port, you need to specify it with dstport”.

Share

January 03, 2020

Playing with the python prometheus query API

Share

The last few days have been a bit icky around here, with my house apparently proudly residing in the major city with the dirtiest air in the world. So, I needed a distraction…

It has also been quite hot, so I wondered how my energy usage was going. I have prometheus monitoring of my power draw, so now seemed as good a time as any to learn how to do some historical querying over the API. I ended up with a python script which can output things like this: Yesterday had a maximum temperature of 38 and we used 28.36 kwh. The average for similar days is 25.56 kwh.”

The code is on github if it is of interest to others. I am sure I could push more of this processing down into the prometheus engine, but I couldn’t see how to do it today. Hints welcome!

Share

January 01, 2020

Receiving slow-scan television images from the International Space Station

Thanks to a fellow VECTOR volunteer Nick Doyle, I found out that the International Space Station would be broadcasting slow-scan television images at the end of the year. I decided to try and pick those up with my handheld radio.

Planning

From the official announcement, I got the frequency (145.800 MHz) and the broadcast times.

Next I had to figure out when the ISS would be passing over my location. Most of the ISS tracking websites and applications are aimed at people wanting to see the reflection of the sun on the station and so they only list the passes during nighttime before the earth casts a shadow that would prevent any visual contacts.

Thankfully, Nick found a site which has a option to show all of the passes, visible or not and so I was able to get a list of upcoming passes over Vancouver.

Hardware

From a hardware point-of-view, I didn't have to get any special equipment. I used my Kenwood D72 and an external Comet SBB5 mobile antenna.

The only other pieces of equipment I used was a 2.5mm mono adapter which I used to connect a 3.5mm male-male audio cable in the speaker port of the radio and the microphone input of my computer.

Software

The software I used for the recording was Audacity set to a sampling rate of 48 kHz.

Then I installed qsstv and configured it to read input from a file instead of the sound card.

Results

Here is the audio I recorded from the first pass (65 degrees at the highest point) as well as the rendered image:

The second pass (60 degrees) was not as successful since I didn't hold the squelch open and you can tell from the audio recording that the signal got drowned in noise a couple of times. This is the rendering of that second pass:

Tips

The signal came through the squelch for only about a minute at the highest point, so I found it best to open the squelch fully (F+Moni) as soon as the bird is visible.

Another thing I did on a third pass (16 degrees at the highest point -- not particularly visible) was to plug the speaker out of my radio into a Y splitter so that I could connect it to my computer and an external speaker I could take outside with me. Since I was able to listen to the audio, I held the antenna and tried to point it at the satellite's general direction as well as varying the orientation of the antenna to increase the signal strength.

AudioBooks – December 2019

Call the Ambulance! by Les Pringle

Stories from a British Ambulance driver in the late-1970s and 1980s. A good range of stories from the funny to the tragic. 7/10

Permanent Record by Edward Snowden

An autobiography by the NSA Whistle-blower. Mostly a recounting of his life, career and circumstances that led up to him leaking. Interesting. 7/10

Life in the Middle Ages by Richard Winston

As the titles describes. Unusually for English Language books it focuses on France. Not much history just daily life & only 5h long. Probably works better with pictures. 6/10

Dr Space Junk vs the Universe: Archaeology and the Future by Alice Gorman

A Mix of topics. Some autobiography & how she worked her way into the archeology of spaceflight. Plus items of Space History & comparisons with earth archeology. But it works 8/10

Little House in the Big Woods by Laura Ingalls Wilder

Only 3h 40m long and roughly covering a year. The author describes her life (aged 5-6) and her family in a cabin Wisconsin in the early 1870s. 1st in the series. 7/10

Abraham Lincoln: A Life (Volume One) by Michael Burlingame

50h and covers up to his 1st inauguration. Not a good 1st Lincoln bio to read but very good. Some repetition as multiple sources a quoted on some points. 7/10

Share

BlueHackers crowd-funding free psychology services at LCA and other conferences

BlueHackers has in the past arranged for a free counsellor/psychologist at several conferences (LCA, OSDC). Given the popularity and great reception of this service, we want to make this a regular thing and try to get this service available at every conference possible – well, at least Australian open source and related events.

Right now we’re trying to arrange for the service to be available at LCA2020 at the Gold Coast, we have excellent local psychologists already, and the LCA organisers are working on some of the logistical aspects.

Meanwhile, we need to get the funds organised. Fortunately this has never been a problem with BlueHackers, people know this is important stuff. We can make a real difference.

Unfortunately BlueHackers hasn’t yet completed its transition from OSDClub project to Linux Australia subcommittee, so this fundraiser is running in my personal name. Well, you know who I (Arjen) am, so I hope you’re ok all with that.

We have a little over a week until LCA2020 starts, let’s make this happen! Thanks. You can donate via MyCause.

Speeding up Blackbird boot: the SBE

The Self Boot Engine (SBE) is a small embedded PPE42 core inside the POWER9 CPU which has the unenvious job of getting a single POWER9 core ready enough to start executing instructions out of L3 cache, and poking some instructions into said cache for the core to start executing.

It’s called the “Self Boot Engine” as in generations prior to POWER8, it was the job of the FSP (Service Processor) to do all of the booting for the CPU. On POWER8, there was still an SBE, but it was a custom instruction set (this was the Power On Reset Engine – PORE), while the PPE42 is basically a 32bit powerpc core cut straight down the middle (just the way to make it awkward for toolchains).

One of the things I noted in my post on Booting temporary firmware on the Raptor Blackbird is that we got serial console output from the SBE. It turns out one of thing things explicitly not enabled by Raptor in their build was this output as “it made the SBE boot much slower”. I’d actually long suspected this, but hadn’t really had the time to delve into it.

Since for POWER9, the firmware for the SBE is now open source code, as is the ppe42-binutils and ppe42-gcc toolchain for it. This means we can hack on it!

WARNING: hacking on your SBE firmware can be relatively dangerous, as it’s literally the first thing that needs to work in order to boot the system, and there isn’t (AFAIK) publicly documented easy way to re-flash your SBE firmware if you mess it up.

Seeing as we saw a regression in boot time with the UART output enabled, we need to look at the uartPutChar() function in sbeConsole.C (error paths removed for clarity):

static void uartPutChar(char c)
{
    #define SBE_FUNC "uartPutChar"
    uint32_t rc = SBE_SEC_OPERATION_SUCCESSFUL;
    do {
        static const uint64_t DELAY_NS = 100;
        static const uint64_t DELAY_LOOPS = 100000000;

        uint64_t loops = 0;
        uint8_t data = 0;
        do {
            rc = readReg(LSR, data);
...
            if(data == LSR_BAD || (data & LSR_THRE))
            {
                break;
            }
            delay(DELAY_NS, 1000000);
        } while(++loops < DELAY_LOOPS);

...
        rc = writeReg(THR, c);
...
    } while(0);

    #undef SBE_FUNC
}

One thing you may notice if you’ve spent some time around serial ports is that it’s not using the transmit FIFO! While according to Wikipedia the original 16550 had a broken FIFO, but we’re certainly not going to be hooked up to an original rev of that silicon.

To compare, let’s look at the skiboot code, which is all in hw/lpc-uart.c:

static void uart_check_tx_room(void)
{
	if (uart_read(REG_LSR) & LSR_THRE) {
		/* FIFO is 16 entries */
		tx_room = 16;
		tx_full = false;
	}
}

The uart_check_tx_room() function is pretty simple, it checks if there’s room in the FIFO and knows that there’s 16 entries. Next, we have a busy loop that waits until there’s room again in the FIFO:

static void uart_wait_tx_room(void)
{
	while (!tx_room) {
		uart_check_tx_room();
		if (!tx_room) {
			smt_lowest();
			do {
				barrier();
				uart_check_tx_room();
			} while (!tx_room);
			smt_medium();
		}
	}
}

Finally, the bit of code that writes the (internal) log buffer out to a serial port:

/*
 * Internal console driver (output only)
 */
static size_t uart_con_write(const char *buf, size_t len)
{
	size_t written = 0;

	/* If LPC bus is bad, we just swallow data */
	if (!lpc_ok() && !mmio_uart_base)
		return written;

	lock(&uart_lock);
	while(written < len) {
		if (tx_room == 0) {
			uart_wait_tx_room();
			if (tx_room == 0)
				goto bail;
		} else {
			uart_write(REG_THR, buf[written++]);
			tx_room--;
		}
	}
 bail:
	unlock(&uart_lock);
	return written;
}

The skiboot code ends up being a bit more complicated thanks to a number of reasons, but the basic algorithm could be applied to the SBE code, and rather than busy waiting for each character to be written out before sending the other into the FIFO, we could just splat things down there and continue with life. So, I put together a patch to try out.

Before (i.e. upstream SBE code): it took about 15 seconds from “Welcome to SBE” to “Booting Hostboot”.

Now (with my patch): Around 10 seconds.

It’s a full five seconds (33%) faster to get through the SBE stage of booting. Wow.

Hopefully somebody looks at the pull request sometime soon, as it’s probably useful to a lot of people doing firmware and Operating System development.

So, Happy New Year for Blackbird owners (I’ll publish a build with this and other misc improvements “soon”).

December 29, 2019

Donations 2019

Each year I do the majority of my Charity donations in early December (just after my birthday) spread over a few days (so as not to get my credit card suspended). I’m a little late this year due to a new credit card and other stuff distracting me.

I also blog about it to hopefully inspire others. See: 2018, 2017, 2016, 2015

All amounts this year are in $US unless otherwise stated

My main donations was to Givewell (to allocate to projects as they prioritize). Once again I’m happy that Givewell make efficient use of money donated.

I donated $50 each to groups providing infrastructure and advocacy. Wikipedia only got $NZ 50 since they converted to my local currency and I didn’t notice until afterwards

Some Software Projects. Software in the Public Interest provides admin support for many Open Source projects. Mozilla does the Firefox Browser and other stuff. Syncthing is an Open Source Project that works like Dropbox

Finally I’m still listening to Corey Olsen’s Exploring the Lord of the Rings series (3 years in and about 20% of the way though) plus his other material

Share

Encoding your WiFi access point password into a QR code

Up until recently, it was a pain to defend againt WPA2 brute-force attacks by using a random 63-character password (the maximum in WPA-Personal) mode). Thanks to Android 10 and iOS 11 supporting reading WiFi passwords from a QR code, this is finally a practical defense.

Generating the QR code

After installing the qrencode package, run the following:

qrencode -o wifi.png "WIFI:T:WPA;S:<SSID>;P:<PASSWORD>;;"

substituting <SSID> for the name of your WiFi network and <PASSWORD> for the 63-character password you hopefully generated with pwgen -s 63.

If your password includes a semicolon, then escape it like this:

"WIFI:T:WPA;S:<SSID>;P:pass\:word;;"

since iOS won't support the following (which works fine on Android):

'WIFI:T:WPA;S:<SSID>;P:"pass:word";;'

The only other pitfall I ran into is that if you include a trailing newline character (for example piping echo "..." into qrencode as opposed to echo -n "...") then it will fail on both iOS and Android.

Scanning the QR code

On iOS, simply open the camera app and scan the QR code to bring up a notification which allows you to connect to the WiFi network:

On Android, go into the WiFi settings and tap on the WiFi network you want to join:

then click the QR icon in the password field and scan the code:

In-browser alternative

If you can't do this locally for some reason, there is also an in-browser QR code generator with source code available.

December 27, 2019

Coming to grips with Kubernetes in 2020: online training

Share

There are a few online training resources I’ve had a play with while learning Kubernetes, so I figure that’s worth a quick write up. This is a follow on from my post about Kubernetes podcasts I’ve tried. I’ve tried three training providers so far:

  • The Linux Foundation Kubernetes course (LFS258 Kubernetes Fundamentals) is probably the “go to” resource for many people, and is often sold bundled with the certification exams. Unfortunately, it is really terrible. It is by far the worst course I’ve seen so far.
  • On the other hand, the Linux Academy Kubernetes course is really good. It is flaw is that you have to sign up to Linux Academy, which provides you with all you can eat courses for a rather steep annual fee.
  • Finally, I discovered Mumshad Mannambeth’s Udemy courses, and frankly they’re excellent. He’s put a huge amount of effort into them and it really shows. Even better, with Udemy’s regular sales you can pick up his three Kubernetes courses (intro, admin certification, and developer certification) for under $50 AUD. There are even plenty of online quizzes.

If I was going to pick a course to try, I’d definitely go with Mumshad.

Share

December 25, 2019

Hacking on Arlec Christmas lights with tasmota

Share

I’m loving the wide array of electrically certified home automation devices we’re seeing now. Light bulbs, sensors, power boards, and even Christmas lights. Specifically Arlec is shipping these app controllable Christmas lights this year, which looked very much like they should work with Tasmota.

(Sorry for the terrible product picture, I can’t find this product online any more, I suspect Bunnings has sold out for the year?)

Specifically, it turns out that these Arlec lights are an ESP8266 which can be flashed with tuya-convert v2 to run tasmota. Once flashed, you can control all of the functions available on the device itself, although there are parts of the protocol I haven’t fully understood yet.

Let’s start off by flashing the device:

  • First off boot your raspberry pi with tuya-convert. I used v2, and I suspect that’s important here so make sure you upgrade if you’re using something old.
  • Next, put the bud lights into programming mode by holding the button on the control box down until the light strand turns off. Release and the strand should start blinking every couple of seconds.
  • Now run the tuya-convert flashing script.
  • Now go to the tasmota-XXXX essid and enter your wifi details into the captive portal. The light strand will now reboot.
  • The light strand should now appear on your wifi, and you can find the IP address and MAC address by asking your DHCP server nicely.

Now for some basic configuration in the web UI. Set the module type to “Tuya MCU (54)”, GPIO1 to “Tuya Tx (107)”, and GPIO3 to “Tuya RX (108)”. You’ll also want to set the usual site-specific configuration options like your MQTT server and so forth.

So what is a “Tuya MCU” when it is at home? Well, it turns out that some tuya devices have an esp8266 which just talks serial to another microcontroller. It kind of makes sense if you already have a microcontroller device you want to make “smart” and you’re super dooper lazy I suppose. There is surprisingly good manufacturer documentation online.

In the case of these bud lights, I have a strong theory that we can basically cycle the modes available by pressing the physical button, but I needed a way to validate that.

You can read more about how these devices work on the tuya protocols page, or you can just jump ahead to the simple programs I wrote to explore these devices. However, a quick summary of the serial protocol spoken between the esp8266 and the MCU is helpful. Packets look like this:

  • Frame header: fixed 2 byte value 0x55aa
  • Version: 0x00
  • Command word: a byte
  • Data length: 2 bytes
  • Function length: 2 bytes
  • Function command: 1 byte
  • Checksum: 1 byte

First off I wrote a simple program to monitor the state of the device registers (called dpIds for “define product ids”). It just uses the tasmota web console to constantly ask for the current state of the dpIds (“SerialSend5 55aa0001000000”, a hard coded MCU control packet) and prints out any changes. Note that for it to work, the web console log level needs to be set to debug.

Now I can press the button on the device and see what dpIds change. A session looked like this (the notes like “solid white” are things I typed into the terminal as I went):

Clearly dpId 1 (type 1, a boolean) is the power state with 0 being off and 1 being on. This is also easily testable of course. If you send a “TuyaSend1 1,0” to the device using the web console it turns off, and if you send “TuyaSend1 1,1” it turns on. This assignment also maps to the way other Tuya MCU devices are configured, so it seems like a very safe assumption.

dpId 101 (type 4, 1 byte enum) seems to be the mode, walking through the possible values determined:

  • 0: fast pulse
  • 1: twinkle
  • 2: alternate
  • 3: alternate differently
  • 4: alternate and cause epileptic fits
  • 5: double alternate
  • 6: flash
  • 7: solid
  • 8: off
  • 9 onwards: brief all

dpId 107 (type 2, 4 byte value) wasn’t changable via the physical button, so I wrote another simple script to send a bunch of values to it. It appears to be a brightness control for the white LEDs, with 0 being off and 99 being fully on. I haven’t managed to find a brightness control for the coloured LEDs. The brightness control also doesn’t appear to work in all modes.

dpId108 (type 2, 4 byte value) remains a mystery to me at this time. It doesn’t seem to change regardless of what values I send it.

dpId 109 (type 4, 1 byte enum) seems to be which strand is on. A value of 0 is just white LEDs, 1 is just coloured LEDs, 2 is all LEDs, and 3 is all LEDs but dimmer.

It sort of doesn’t matter that I haven’t fully decoded the inner workings of the device, because this is enough information for my use case. All I really want is for all the lights to be on solidly (that is, with no blinking). This is because I use them for lighting under my back pergola and the blinking would be quite annoying.

So how do you wire up Home Assistant to send serial packets to a slave MCU over MQTT? Home Assistant will already control the lights turning on and off because of the default relay implementation for dpId 1. What I need to be able to do is turn on all the lights in a solidly on configuration. For that, I can implement rules like:

>> Rule1
ON Event#0 DO TuyaSend4 101,0 ENDON
ON Event#1 DO TuyaSend4 101,1 ENDON
ON Event#2 DO TuyaSend4 101,2 ENDON
ON Event#3 DO TuyaSend4 101,3 ENDON
ON Event#4 DO TuyaSend4 101,4 ENDON
ON Event#5 DO TuyaSend4 101,5 ENDON
ON Event#6 DO TuyaSend4 101,6 ENDON
ON Event#7 DO TuyaSend4 101,7 ENDON
ON Event#8 DO TuyaSend4 101,8 ENDON
ON Event#9 DO TuyaSend4 101,9 ENDON

>> Rule1 ON

>> Rule2
ON Power1#state=1 DO TuyaSend1 1,1 ENDON
ON Power1#state=1 DO TuyaSend4 109,2 ENDON

>> Rule2 ON

You apply this rule by pasting it into the web console on the device. Note that there are four separate pasted commands. Rule1 exposes the modes from dpId101 as effects in Home Assistant, and Rule2 hooks to the power on MQTT command to ensure that the lights are set to solidly on via dpId 109.

The matching Home Assistant configuration looks like this:

# Arlec fairy lights on the back deck
- platform: mqtt
  name: "Back deck 1"
  command_topic: "cmnd/sonoff14/POWER"
  state_topic: "tele/sonoff14/STATE"
  state_value_template: "{{value_json.POWER}}"
  availability_topic: "tele/sonoff14/LWT"
  effect_command_topic: "cmnd/sonoff14/Event"
  effect_list:
    - 0
    - 1
    - 2
    - 3
    - 4
    - 5
    - 6
    - 7
    - 8
    - 9
  payload_on: "ON"
  payload_off: "OFF"
  payload_available: "Online"
  payload_not_available: "Offline"
  qos: 1
  retain: false

This gives me the ability to turn the fairy lights on and off via Home Assistant, and ensure they they’re solidly on and not blinking. That’s good enough for now.

Share

December 23, 2019

Further thoughts on Azure instance start times

Share

My post from the other day about slow instance starts on Azure caused some commentary (mainly on reddit) that prompted me to think more about all this. In the end, there were a few more experiments I wanted to run to see if I could squeeze more performance out of Azure.

First off, looking at the logs from my initial testing it looks like resource groups are slow. The original terraform creates a resource group as part of the test and then cleans it up at the end. What if instead we had a single permanent resource group and created instances within that?

Here is a series of instance starts and deletes using the terraform from the last post:

You’ll notice that there’s no delete value for the last instance. That’s because terraform crashed and never deleted the instance. You can also see that instance starts are somewhat consistent, except for being slower in the second half of the test than the first, and occasionally spiking out to very very slow. Oh, and deletes are almost always really slow.

What happens if we use a permanent resource group and network? This means that all the “instance start terraform” is doing is creating a network interface and then an instance which uses that network interface. It has to be faster, but does it resolve our issues?

The dashed lines are the graph from above, the solid lines are the new data without resource group creation. You can see that abstracting away the resource group work has made a significant performance improvement. Instance start times are now generally under 100 seconds (which is still three times slower than AWS, and four or five times slower than Google).

So is it just that the Australian Azure zones are slow? I re-ran the new terraform against a US datacenter (East US). Here’s a zoom in of just the instance creates with the resource group extracted to make that clearer, for both data centers:

Interestingly, the Australian data center actually performs better than the US one, which isn’t what I would expect at all. You can also see in this test run that we do still see some unexpectedly slow instance launches, although they feel less frequent and smaller when they happen. That might also just be that I’m testing over a weekend and the data center might be more idle.

Looping back, I think we’ve learnt that resource groups are expensive. The last thing I wanted to dig into was what exactly was happening in those spikes where we had resource groups included. Luckily, they were happening about the point I started logging the terraform trace output of the run.

For example, run azure_1576926569_7_0_apply took 18 minutes and 3 seconds to create the instance. For those 18 minutes, terraform logs that the instance was marked by the Azure API as in provisioningState “Creating”. This correlates with operation id c983b272-fa32-4814-b858-adab3da4d9b1 sitting in state “InProgress”, unfortunately there isn’t a reason logged for why that is. So I guess its not possible as an Azure user to work out why things are sometimes slow.

To summarise some advice for terraform users on Azure — don’t create resource groups if you can avoid it. Create global resource groups and then place new objects into them instead. That said, you’re still going to have slower and less consistent performance than other clouds.

Finally, is instance start time a valid metric for cloud performance? Probably not. That said, it is table stakes to be in the conversation. Slow instance starts affect my overall experience of the cloud, as well as the workability of horizontal scaling techniques. This is especially true for instance start times which vary wildly like Azure’s do — I simply can’t trust that I can grow a horizontal scaling set with any sort of reasonable timeframe.

Share

December 22, 2019

Backing up to S3 with Duplicity

Here is how I setup duplicity to use S3 as a backend while giving duplicity the minimum set of permissions to my Amazon Web Services account.

AWS Security Settings

First of all, I enabled the following general security settings in my AWS account:

  • MFA with a U2F device
  • no root user access keys

Then I set a password policy in the IAM Account Settings and turned off all public access in the S3 Account Settings.

Creating an S3 bucket

As a destination for the backups, I created a new backup-foobar S3 bucket keeping all of the default options except for the region which I set to ca-central-1 to ensure that my data would stay in Canada.

The bucket name can be anything you want as long as:

  • it's not already taken by another AWS user
  • it's a valid hostname (i.e. alphanumeric characters or dashes)

Note that I did not enable S3 server-side encryption since I will be encrypting the backups client-side using the support built into duplicity instead.

Creating a restricted user account

Then I went back into the Identity and Access Managment console and created a new DuplicityBackup policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:GetObject",
                "s3:ListBucketMultipartUploads",
                "s3:AbortMultipartUpload",
                "s3:CreateBucket",
                "s3:ListBucket",
                "s3:DeleteObject",
                "s3:ListMultipartUploadParts"
            ],
            "Resource": [
                "arn:aws:s3:::backup-foobar",
                "arn:aws:s3:::backup-foobar/*"
            ]
        },
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": "s3:ListAllMyBuckets",
            "Resource": "*"
        }
    ]
}

It's unfortunate that the unrestricted s3:ListAllMyBuckets permission has to be granted, but in my testing, duplicity would error out without it. No other permissions were needed.

The next step was to create a new DuplicityBackupHosts IAM group to which I attached the DuplicityBackup policy.

Finally, I created a new machinename IAM user:

  • Access: programmatic only
  • Group: DuplicityBackupHosts
  • Tags: duplicity=1

and wrote down the access key and the access key secret.

Duplicity settings

Once that's all set, I was able to use duplicity using the following options:

  • --s3-use-new-style: apparently required on non-US regions
  • --s3-use-ia: recommended pricing structure for backups
  • --s3-use-multiprocessing: speeds up uploading of backup chunks

and used the following remote URL:

s3://s3.ca-central-1.amazonaws.com/backup-foobar/machinename

which hardcodes the region in order to work-around the lack of explicit region support in duplicity.

I ended up with the following command:

http_proxy= AWS_ACCESS_KEY_ID=<access_key> AWS_SECRET_ACCESS_KEY=<access_key_secret> PASSPHRASE=<password> duplicity --s3-use-new-style --s3-use-ia --s3-use-multiprocessing --no-print-statistics --verbosity 1 --exclude-device-files --exclude-filelist <exclude_file> --include-filelist <include_file> --exclude '**' / <remote_url>

where <exclude_file> is a file which contains the list of paths to keep out of my backup:

/etc/.git
/home/francois/.cache

<include_file> is a file which contains the list of paths to include in the backup:

/etc
/home/francois
/usr/local/bin
/usr/local/sbin
/var/log/apache2
/var/www

and <password> is a long random string (pwgen -s 64) used to encrypt the backups.

Backup script

Here are two other things I included in my backup script prior to the actual backup line listed in the previous section.

The first one deletes files related to failed backups:

http_proxy= AWS_ACCESS_KEY_ID=<access_key> AWS_SECRET_ACCESS_KEY=<access_key_secret> PASSPHRASE=<password> duplicity cleanup --verbosity 1 --force <remote_url>

and the second deletes old backups (older than 12 days in this example):

http_proxy= AWS_ACCESS_KEY_ID=<access_key> AWS_SECRET_ACCESS_KEY=<access_key_secret> PASSPHRASE=<password> duplicity remove-older-than 12D --verbosity 1 --force <remote_url>

Feel free to leave a comment if I forgot anything that might be useful!

December 21, 2019

Why is Azure so slow to start instances?

Share

I’ve been playing with terraform recently, and decided to see how different the terraform for launching a simple Ubuntu instance in various clouds is. There are two big questions there for me — how big is the variation between OpenStack derived clouds; and how painful is it to move between the proprietary clouds? Part of this is because terraform doesn’t present a standardised layer of cloud functionality, it has a provider per cloud.

(Although, I suspect there’s nothing stopping someone from writing a libcloud provider or something like that. It is an interesting idea which requires some additional thought.)

My terraform implementations for each cloud are on github if you’re interested. I don’t want to spend a lot of analysis on the actual terraform, because I think the really interesting thing I found isn’t where I expected it to be (there’s a hint in the title for this post). That said, the OpenStack clouds vary mostly by capabilities. vexxhost for example seems to only offer flavors that require boot-from-volume. The proprietary clouds are complete re-writes, but are generally relatively simple and well documented.

However, that interesting accidental thing — as best as I can tell, Microsoft Azure is really really slow to launch instances. The graph below presents five instance launches on each cloud I tested:

As you can see, Vault, Vexxhost, and AWS are basically all in the same ballpark. Google and Azure are outliers, with Google being crazy fast (but also very slow to delete instances, a metric not presented here), and Azure being more than three times slower than everyone else.

Instance launch time isn’t a great metric to be honest, but it does matter. For example if you were trying to autoscale a web tier or a kubernetes cluster, then waiting over two minutes just for the instance to boot before it can be configured and added to the cluster is probably not ok.

I wonder why Azure is so slow?

I did some further exploring after writing this post and was able to improve performance by changing how I handled resource groups in the terraform. The performance still isn’t great though. You can read more about that in a separate post if you’d like.

Share

December 19, 2019

Red Hat crippling CloudForms product and migrating users to IBM

CloudForms is Red Hat’s supported version of upstream ManageIQ, an infrastructure management platform. It lets you see, manage and deploy to various platforms like OpenStack, VMWare, RHEV, OpenShift and public cloud like AWS and Azure, with single pane of glass view across them all. It has its own orchestration engine but also integrates with Ansible for automated deployments.

As best I can tell, their CloudForms updated Statement of Direction article (behind paywall, sorry) shows that Red Hat is killing off support for non-Red Hat platforms like VMware, AWS, Azure, etc. The justification is to focus on open platforms, which I think means CloudForms will ultimately disappear entirely with Red Hat focusing on OpenShift instead.

We made a strategic decision to focus our management strategy on the future — open, cloud-native environments that promote portability across on-premise, private and public clouds.

CloudForms updated Statement of Direction

However to me this is still a big blow to users of the platform, where I’m sure most will have at least some VMWare to manage. Indeed, when implementing CloudForms at work and talking to Red Hat, they said that their most mature integration in CloudForms is with VMWare.

According to the Red Hat article, CloudForms with full platform support is being embedded into IBM Cloud Pak for Multicloud Management and users are encouraged to “migrate your Red Hat CloudForms subscriptions to IBM Cloud Pak for Multicloud Management licenses.” Red Hat’s CloudForms Statement of Direction FAQ article lays out the migration path, which does confirm Red Hat will continue to support existing clients for the remainder of their subscription.

So in short, CloudForms from Red Hat is being crippled and will only support Red Hat products, which really means that users are being forced to buy IBM instead. Of course Red Hat is entitled to change their own products, but this move does seem curious when execs on both sides said they would remain independent. Maybe it’s better than killing CloudForms outright?

We can publicly say that all our products will survive in their current form and continue to grow. We will continue to support all our products; we’re separate entities and we’re going to have separate contracts, and there is no intention to de-emphasise any of our products and we’ll continue to invest heavily in it.

Jim Whitehurst as Red Hat CEO

December 18, 2019

KVM guests with emulated SSD and NVMe drives

Sometimes when you’re using KVM guests to test something, perhaps like a Ceph or OpenStack Swift cluster, it can be useful to have SSD and NVMe drives. I’m not talking about passing physical drives through, but rather emulating them.

NVMe drives

QEMU supports emulating NVMe drives as arguments on the command line, but it’s not yet exposed to tools like virt-manager. This means you can’t just add a new drive of type nvme into your virtual machine XML definition, however you can add those qemu arguments to your XML. This also means that the NVMe drives will not show up as drives in tools like virt-manager, even after you’ve added them with qemu. Still, it’s fun to play with!

QEMU command line args for NVMe

Michael Moese has nicely documented how to do this on his blog. Basically, after creating a disk image (raw or qcow2) you can add the following two arguments like this to the qemu command. I use a numeric drive id and serial so that I can add multiple NVMe drives (just duplicate the lines and increment the number).

-drive file=/path/to/nvme1.img,if=none,id=NVME1 \
-device nvme,drive=NVME1,serial=nvme-1

libvirt XML definition for NVMe

To add NVMe to a libvirt guest, add something like this at the bottom of your virtual machine definition (before the closing </domain> tag) to call those same qemu args.

  <qemu:commandline>
    <qemu:arg value='-drive'/>
    <qemu:arg value='file=/path/to/nvme1.img,format=raw,if=none,id=NVME1'/>
    <qemu:arg value='-device'/>
    <qemu:arg value='nvme,drive=NVME1,serial=nvme-1'/>
  </qemu:commandline>

virt-install for NVMe

If you’re spinning up VMs using virt-install, then you can also pass these in as arguments, which will automatically populate the libvirt XML file with the arguments above. Note as above, you do not add a --disk option for NVMe drives.

--qemu-commandline='-drive file=/path/to/nvme1.img,format=raw,if=none,id=NVME1'
--qemu-commandline='-device nvme,drive=NVME1,serial=nvme-1'

Confirming drive is NVMe

Your NVMe drives will show up as specific devices under Linux, like /dev/nvme0n1 and of course you can see them with tools like lsblk and nvme (from nvme-cli package).

Here’s nvme tool listing the NVMe drive in a guest.

sudo nvme list

This should return something that looks like this.

Node          SN      Model           Namespace Usage                   Format         FW Rev  
------------- ------- --------------- --------- ----------------------- -------------- ------
/dev/nvme0n1  nvme-1  QEMU NVMe Ctrl  1         107.37  GB / 107.37  GB 512   B +  0 B 1.0

SSD drives

SSD drives are slightly different. Simply add a drive to your guest as you normally would, on the bus you want to use (for example, SCSI or SATA). Then, add the required set command to set rotational speed to make it an SSD (note that you set it to 1 in qemu, which sets it to 0 in Linux).

This does require you to know the name of the device so it will depend on how many drives you add of that type. Although it generally follows a format like this, for the first SCSI drive on the first SCSI controller, scsi0-0-0-0 and for SATA, sata0-0-0, but it’s good to confirm.

You can determine the exact name for your drive by querying the guest with virsh qemu-monitor-command, like so.

virsh qemu-monitor-command --hmp 1 "info qtree"

This will provide details showing the devices, buses and connected drives. Here’s an example for the first SCSI drive, where you can see it’s scsi0-0-0-0.

                  dev: scsi-hd, id "scsi0-0-0-0"
                    drive = "drive-scsi0-0-0-0"
                    logical_block_size = 512 (0x200)
                    physical_block_size = 512 (0x200)
                    min_io_size = 0 (0x0)
                    opt_io_size = 0 (0x0)
                    discard_granularity = 4096 (0x1000)
                    write-cache = "on"
                    share-rw = false
                    rerror = "auto"
                    werror = "auto"
                    ver = "2.5+"
                    serial = ""
                    vendor = "QEMU"
                    product = "QEMU HARDDISK"
                    device_id = "drive-scsi0-0-0-0"
                    removable = false
                    dpofua = false
                    wwn = 0 (0x0)
                    port_wwn = 0 (0x0)
                    port_index = 0 (0x0)
                    max_unmap_size = 1073741824 (0x40000000)
                    max_io_size = 2147483647 (0x7fffffff)
                    rotation_rate = 1 (0x1)
                    scsi_version = 5 (0x5)
                    cyls = 16383 (0x3fff)
                    heads = 16 (0x10)
                    secs = 63 (0x3f)
                    channel = 0 (0x0)
                    scsi-id = 0 (0x0)
                    lun = 0 (0x0)

QEMU command for SSD drive

When using qemu, add your drive as usual and then add the set option. Using the SCSI drive example from above (which is on scsi0-0-0-0), this is what it would look like.

-set device.scsi0-0-0-0.rotation_rate=1

libvirt XML definition for SSD drive

Similarly, for a defined guest, add the set argument like we did for NVMe drives, that is at the bottom of the XML, before the closing </domain> tag.

  <qemu:commandline>
    <qemu:arg value='-set'/>
    <qemu:arg value='device.scsi0-0-0-0.rotation_rate=1'/>
  </qemu:commandline>

If your machine has NVMe drives specified also, just add the set args for the SSD, don’t add a second qemu:commandline section. It should look something like this.

  <qemu:commandline>
    <qemu:arg value='-set'/>
    <qemu:arg value='device.scsi0-0-0-0.rotation_rate=1'/>
    <qemu:arg value='-drive'/>
    <qemu:arg value='file=/var/lib/libvirt/images/rancher-vm-centos-7-00-nvme.qcow2,format=qcow2,if=none,id=NVME1'/>
    <qemu:arg value='-device'/>
    <qemu:arg value='nvme,drive=NVME1,serial=nvme-1'/>
  </qemu:commandline>

virt-install command for SSD drive

When spinning up machine using virt-install, add a drive as normal. The only thing you have to add is the argument for the qemu set command. Here’s that same SCSI example.

--qemu-commandline='-set device.scsi0-0-0-0.rotation_rate=1'

Confirming drive is an SSD

You can confirm the rotational speed with lsblk, like so.

sudo lsblk -d -o name,rota

This will return either 0 (for rotational speed false, meaning SSD) or 1 (for rotating drives, meaning non-SSD). For example, here’s a bunch of drives on a KVM guest where you can see /dev/sda and /dev/nvmen0n1 are both SSDs.

NAME    ROTA
sda        0
sdb        1
sr0        1
vda        1
nvme0n1    0

You can also check with smartctl, which will report the rotational rate as an SSD. Here’s an example on /dev/sda which is set to be an SSD in KVM guest.

smartctl -i /dev/sda

This shows a result like this, note Rotational Rate is Solid State Device.

=== START OF INFORMATION SECTION ===
Vendor:               QEMU
Product:              QEMU HARDDISK
Revision:             2.5+
Compliance:           SPC-3
User Capacity:        107,374,182,400 bytes [107 GB]
Logical block size:   512 bytes
LU is thin provisioned, LBPRZ=0
Rotation Rate:        Solid State Device
Device type:          disk
Local Time is:        Wed Dec 18 17:52:18 2019 AEDT
SMART support is:     Unavailable - device lacks SMART capability.

So that’s it! Thanks to QEMU you can play with NVMe and SSD drives in your guests.

rfid and hrfid

I was staring at some assembly recently, and for not the first time encountered rfid and hrfid, two instructions that we use when doing things like returning to userspace, returning from OPAL to the kernel, or from a host kernel into a guest.

rfid copies various bits from the register SRR1 (Machine Status Save/Restore Register 1) into the MSR (Machine State Register), and then jumps to an address given in SRR0 (Machine Status Save/Restore Register 0). hrfid does something similar, using HSRR0 and HSRR1 (Hypervisor Machine Status Save/Restore Registers 0/1), and slightly different handling of MSR bits.

The various Save/Restore Registers are used to preserve the state of the CPU before jumping to an interrupt handler, entering the kernel, etc, and are set up as part of instructions like sc (System Call), by the interrupt mechanism, or manually (using instructions like mtsrr1).

Anyway, the way in which rfid and hrfid restores MSR bits is documented somewhat obtusely in the ISA (if you don't believe me, look it up), and I was annoyed by this, so here, have a more useful definition. Leave a comment if I got something wrong.

rfid - Return From Interrupt Doubleword

Machine State Register

Copy all bits (except some reserved bits) from SRR1 into the MSR, with the following exceptions:

  • MSR_3 (HV, Hypervisor State) = MSR_3 & SRR1_3
    [We won't put the thread into hypervisor state if we're not already in hypervisor state]

  • If MSR_29:31 != 0b010 [Transaction State Suspended, TM not available], or SRR1_29:31 != 0b000 [Transaction State Non-transactional, TM not available] then:

    • MSR_29:30 (TS, Transaction State) = SRR1_29:30
    • MSR_31 (TM, Transactional Memory Available) = SRR1_31

    [See the ISA description for explanation on how rfid interacts with TM and resulting interrupts]

  • MSR_48 (EE, External Interrupt Enable) = SRR1_48 | SRR1_49 (PR, Problem State)
    [If going into problem state, external interrupts will be enabled]

  • MSR_51 (ME, Machine Check Interrupt Enable) = (MSR_3 (HV, Hypervisor State) & SRR1_51) | ((! MSR_3) & MSR_51)
    [If we're not already in hypervisor state, we won't alter ME]

  • MSR_58 (IR, Instruction Relocate) = SRR1_58 | SRR1_49 (PR, Problem State)
    [If going into problem state, relocation will be enabled]

  • MSR_59 (DR, Data Relocate) = SRR1_59 | SRR1_49 (PR, Problem State)
    [If going into problem state, relocation will be enabled]

Next Instruction Address

  • NIA = SRR0_0:61 || 0b00
    [Jump to SRR0, set last 2 bits to zero to ensure address is aligned to 4 bytes]

hrfid - Hypervisor Return From Interrupt Doubleword

Machine State Register

Copy all bits (except some reserved bits) from HSRR1 into the MSR, with the following exceptions:

  • If MSR_29:31 != 0b010 [Transaction State Suspended, TM not available], or HSRR1_29:31 != 0b000 [Transaction State Non-transactional, TM not available] then:

    • MSR_29:30 (TS, Transaction State) = HSRR1_29:30
    • MSR_31 (TM, Transactional Memory Available) = HSRR1_31

    [See the ISA description for explanation on how rfid interacts with TM and resulting interrupts]

  • MSR_48 (EE, External Interrupt Enable) = HSRR1_48 | HSRR1_49 (PR, Problem State)
    [If going into problem state, external interrupts will be enabled]

  • MSR_58 (IR, Instruction Relocate) = HSRR1_58 | HSRR1_49 (PR, Problem State)
    [If going into problem state, relocation will be enabled]

  • MSR_59 (DR, Data Relocate) = HSRR1_59 | HSRR1_49 (PR, Problem State)
    [If going into problem state, relocation will be enabled]

Next Instruction Address

  • NIA = HSRR0_0:61 || 0b00
    [Jump to HSRR0, set last 2 bits to zero to ensure address is aligned to 4 bytes]

December 17, 2019

A close-to-upstream firmware build for the Raptor Blackbird

It goes without saying that using this build is a At Your Own Risk and I make zero warranty. AFAIK it can’t physically destroy your system.

My GitHub op-build branch stewart-blackbird-v1 has all the changes built into this build (the VERSION displayed in firmware will be slightly weird as I did the tagging afterwards… this is not meant to be “howto release firmware to the public”). Follow op-build pull 3341 for the state of upstreaming everything.

Binaries are over at https://www.flamingspork.com/blackbird/stewart-blackbird-v1-images/ (see the git branch of op-build for source).

To flash it (temporarily), grab blackbird.pnor, get it to /tmp on your BMC and follow the instructions I posted the other day.

I’d be interested in any feedback on what does/does not work.

December 15, 2019

Are you Fans of the Blackbird? Speak up, I can’t hear you over the fan.

So, as of yesterday, I started running a pretty-close-to-upstream op-build host firmware stack on my Blackbird. Notable yak-shaving has included:

Apart from that, I was all happy as Larry. Except then I went into the room with the Blackbird in it an went “huh, that’s loud”, and since it was bedtime, I decided it could all wait until the morning.

It is now the morning. Checking fan speeds over IPMI, one fan stood out (fan2, sitting at 4300RPM). This was a bit of a surprise as what’s silkscreened on the board is that the rear case fan is hooked up to ‘fan2″, and if we had a “start from 0/1” mix up, it’d be the front case fan. I had just assumed it’d be maybe OCC firmware dying or something, but this wasn’t the case (I checked – thanks occtoolp9!)

After a bit of digging around, I worked out this mapping:

IPMI fan0Rear Case FanMotherboard Fan 2
IPMI fan1Front Case FanMotherboard Fan 3
IPMI fan2CPU FanMotherboard Fan 1

Which is about as surprising and confusing as you’d think.

After a bunch of digging around the Raptor ports of OpenBMC and Hostboot, it seems that the IPL Observer which is custom to Raptor controls if the BMC decides to do fan control or not.

You can get its view of the world from the BMC via the (incredibly user friendly) poking at DBus:

busctl get-property org.openbmc.status.IPL /org/openbmc/status/IPL org.openbmc.status.IPL current_status; busctl get-property org.openbmc.status.IPL /org/openbmc/status/IPL org.openbmc.status.IPL current_istep

Which if you just have the Hostboot patch in (like I first did) you end up with:

s "IPL_RUNNING"
s "21,3"

Which is where Hostboot exits the IPL process (as you see on the screen) and hands over to skiboot. But if you start digging through their op-build tree, you find that there’s a signal_linux_start_complete script which calls pnv-lpc to write two values to LPC ports 0x81 and 0x82. The pnv-lpc utility is the external/lpc/ binary from skiboot, and these two ports are the “extended lpc port 80h” state.

So, to get back fan control? First, build the lpc utility:

git clone git@github.com:open-power/skiboot.git
cd skiboot/external/lpc
make

and then poke the magic values of “IPL complete and linux running”:

$ sudo ./lpc io 0x81.b=254
[io] W 0x00000081.b=0xfe
$ sudo ./lpc io 0x82.b=254
[io] W 0x00000082.b=0xfe

You get a friendly beep, and then your fans return to sanity.

Of course, for that to work you need to have debugfs mounted, as this pokes OPAL debugfs to do direct LPC operations.

Next up: think of a smarter way to trigger that than “stewart runs it on the command line”. Also next up: work out the better way to determine that fan control should be on and patch the BMC.

Automatically updating containers with Docker

Running something in a container using Docker or Podman is cool, but maybe you want an automated way to always run the latest container? Using the :latest tag alone does not to this, that just pulls the latest container at the time. You could have a cronjob that just always pulls the latest containers and restarts the container but then if there’s no update you have an outage for no reason.

It’s not too hard to write a script to pull the latest container and restart the service only if required, then tie that together with a systemd timer.

To restart a container you need to know how it was started. If you have only one container then you could just hard-code it, however it gets more tricky to manage if you have a number of containers. This is where something like runlike can help!

First, start up your container however you need to (OwnTracks recorder, for example).

Next, let’s install runlike with pip.

sudo pip install runlike

Now, let’s create a simple script that takes one optional argument, the name of a running container. If the argument is omitted, it will default to all containers. The script will check if the latest image is different to the running image, and if so, restart the container using the new image with the same arguments as before (determined by runlike). If there is no newer image, then it will just leave the running container alone.

Create the script.

cat << \EOF | sudo tee /usr/local/bin/update-containers.sh
#!/bin/bash

# Abort on all errors, set -x
set -o errexit

# Get the containers from first argument, else get all containers
CONTAINER_LIST="${1:-$(docker ps -q)}"

for container in ${CONTAINER_LIST}; do
  # Get the image and hash of the running container
  CONTAINER_IMAGE="$(docker inspect --format "{{.Config.Image}}" --type container ${container})"
  RUNNING_IMAGE="$(docker inspect --format "{{.Image}}" --type container "${container}")"

  # Pull in latest version of the container and get the hash
  docker pull "${CONTAINER_IMAGE}"
  LATEST_IMAGE="$(docker inspect --format "{{.Id}}" --type image "${CONTAINER_IMAGE}")"

  # Restart the container if the image is different
  if [[ "${RUNNING_IMAGE}" != "${LATEST_IMAGE}" ]]; then
    echo "Updating ${container} image ${CONTAINER_IMAGE}"
    DOCKER_COMMAND="$(runlike "${container}")"
    docker rm --force "${container}"
    eval ${DOCKER_COMMAND}
  fi
done
EOF

Make the script executable.

sudo chmod a+x /usr/local/bin/update-containers.sh

You can test the script by just running it.

/usr/local/bin/update-containers.sh

Now that you have a script which will check for a new images and update containers, let’s make a systemd service and timer for it. This way you can schedule regular update checks whenever you like. If you want a script for a specific container, just add the container names as arguments to the script.

First, create the service

cat << EOF | sudo tee /etc/systemd/system/update-containers.service 
[Unit]
Description=Update containers
After=network-online.target

[Service]
Type=oneshot
ExecStart=/usr/local/bin/update-containers.sh
EOF

Next, create the matching timer service (note that the service and timer names need to match).

cat << EOF | sudo tee /etc/systemd/system/update-containers.timer 
[Unit]
Description=Timer for updating containers
Wants=network-online.target

[Timer]
OnActiveSec=24h
OnUnitActiveSec=24h

[Install]
WantedBy=timer.target
EOF

Reload systemd to pick up the new service and enable the timer.

sudo systemctl daemon-reload
sudo systemctl start update-containers.timer
sudo systemctl enable update-containers.timer

You can check the status of the timer and the service using standard systemd tools.

sudo systemctl status update-containers.timer
sudo systemctl status update-containers.service
sudo journalctl -u update-containers.service

That’s it! Sit back and let your containers be automatically updated for you. If you want to manually update a container, you could just use version tags and manage them separately.


Enabling Docker in Fedora 31 by reverting to cgroups v1

Fedora has switched to cgroups v2 by default now, but Docker doesn’t yet support it and so fails to start. If you want to use Docker then you need to revert cgroups to v1 by adding the systemd.unified_cgroup_hierarchy=0 kernel argument.

Add systemd.unified_cgroup_hierarchy=0 to the default GRUB config with sed.

sudo sed -i '/^GRUB_CMDLINE_LINUX/ s/"$/ systemd.unified_cgroup_hierarchy=0"/' /etc/default/grub

Now rebuild your GRUB config.

If you’re using BIOS boot then it’s this.

sudo grub2-mkconfig -o /boot/grub2/grub.cfg

If you’re running EFI, then it’s this.

sudo grub2-mkconfig -o /boot/efi/EFI/fedora/grub.cfg

Now reboot and make sure Docker can start!

OwnTracks recorder in a container on Fedora with Let’s Encrypt and nginx

OwnTracks Recorder is a web application which maps locations over time. Generally, it connects to an MQTT server and subscribes to owntracks/+ topics for any location updates, but it also has a built in function to receive updates over HTTP.

I have been using OwnTracks with MQTT for a while, but found it to be too unreliable on Android (disconnects in the background and doesn’t reconnect nicely). Using HTTP is supposed to be more reliable, so this is how I set it up. The idea is to use OwnTracks on Android to post directly to the OwnTracks recorder over HTTP instead of MQTT and have recorder post the MQTT messages on our behalf using LUA scripts (for Home Assistant).

Friends is an important feature (to let members of the family see where eachother is located) and fortunately it is supported in HTTP mode (but it requires a little bit more configuration).

nginx and base configuration

We will use nginx as a reverse proxy in front of the recorder to provide both TLS and authentication to keep the service private and secure.

sudo dnf install nginx httpd-tools

Configure nginx to proxy to OwnTracks recorder by creating a new config file for the domain you are hosting on. For example, if your domain is owntracks.yourdomain.com then create a file at /etc/nginx/conf.d/owntracks.yourdomain.com. Later certbot will update this to add TLS configuration.

cat << \EOF |sudo tee /etc/nginx/conf.d/owntracks.yourdomain.com.conf
server {
  server_name owntracks.yourdomain.com;
  root /var/www/html;

  auth_basic "OwnTracks";
  auth_basic_user_file /etc/nginx/owntracks.htpasswd;
  proxy_set_header X-Limit-U $remote_user;

  location / {
    proxy_pass http://127.0.0.1:8083/;
    proxy_http_version 1.1;
    proxy_set_header Host $host;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-Real-IP $remote_addr;
  }

  location /ws {
    rewrite ^/(.*) /$1 break;
    proxy_pass http://127.0.0.1:8083;
    proxy_http_version  1.1;
    proxy_set_header Upgrade $http_upgrade;
    proxy_set_header Connection "upgrade";
    proxy_set_header Host $host;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
  }

  location /view/ {
    proxy_buffering off;
    proxy_pass http://127.0.0.1:8083/view/;
    proxy_http_version 1.1;
    proxy_set_header Host $host;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-Real-IP $remote_addr;
  }
  location /static/ {
    proxy_pass http://127.0.0.1:8083/static/;
    proxy_http_version 1.1;
    proxy_set_header Host $host;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-Real-IP $remote_addr;
  }

  location /pub {
    proxy_pass http://127.0.0.1:8083/pub;
    proxy_http_version 1.1;
    proxy_set_header Host $host;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-Real-IP $remote_addr;
  }

  error_page 404 /404.html;
  location = /40x.html {
  }

  error_page 500 502 503 504 /50x.html;
  location = /50x.html {
  }
}
EOF

Now that we have a web server let’s open the ports to enable traffic on port 80 and 443.

sudo firewall-cmd --zone=FedoraServer --add-service=http
sudo firewall-cmd --zone=FedoraServer --add-service=https
sudo firewall-cmd --runtime-to-permanent

SELinux will block nginx from acting as a proxy and connecting to our other services, so we need to tell it that it’s OK.

 sudo setsebool -P httpd_can_network_connect 1
sudo setsebool -P httpd_can_network_relay 1

Note that we’ve set a password file for nginx to protect recorder in the config, now we need to create that file.

Let’s pretend that we have three users, Alice, Bob and Charlie. Create the nginx password file when you add the password for Alice, then add the password for the other two.

sudo htpasswd -c /etc/nginx/owntracks.htpasswd alice
sudo htpasswd /etc/nginx/owntracks.htpasswd bob
sudo htpasswd /etc/nginx/owntracks.htpasswd charlie

That’s the core nginx config done, next we will use cerbot to get a certificate and re-configure nginx to use TLS.

Certbot

Install certbot and the nginx plugin, which will let us get signed certificates from Let’s Encrypt. Using the plugin means it will configure nginx to handle the challenge and write the config file automatically. You will need to make sure that port 80 on your nginx server is available over the Internet (and probably also port 443 so that we can connect securely to recorder remotely) as well as a DNS entry pointing to your external IP (I’ll use owntracks.yourdomain.com as an example).

sudo dnf install certbot python3-certbot-nginx

Next, use certbot to get TLS certificates from Let’s Encrypt. Follow the prompts and be sure to enable TLS redirection so that all traffic will be encrypted.

sudo certbot --agree-tos \
--redirect \
--rsa-key-size 4096 \
--nginx \
-d owntracks.yourdomain.com

Now that we have a certificate, let’s enable auto renewals.

sudo systemctl enable --now certbot-renew.timer

OK, nginx should now be configured with TLS and managed by certbot.

Recorder with Docker

Now let’s get the recorder container going! First install and prepare Docker. Note that if you’re running on Fedora 31 or later, you need to revert to cgroup v1 first.

sudo groupadd -r docker
sudo gpasswd -a ${USER} docker
newgrp docker
sudo dnf install -y cockpit-docker docker
sudo systemctl start docker
sudo systemctl enable docker

Next let’s prepare the configuration and scripts for the container.

sudo mkdir -p /var/lib/owntracks/{config,scripts,logs}

Generally we pass variables into containers, but recorder also supports a config file so we’ll use that instead (OTR_LUASCRIPT is not supported as a variable, anyway). Replace the values for your MQTT server below.

NOTE: OTR_PORT must not be a number not a string, else it will be be ignored.

OTR_HOST="mqtt-broker"
OTR_PORT=mqtt-port
OTR_USER="mqtt-user"
OTR_PASS="mqtt-user-password"

cat << EOF | sudo tee /var/lib/owntracks/config/recorder.conf
OTR_TOPICS = "owntracks/#"
OTR_HTTPHOST = "0.0.0.0"
OTR_STORAGEDIR = "/store"
OTR_HTTPLOGDIR = "/logs"
OTR_LUASCRIPT = "/scripts/hook.lua"
OTR_HOST = "${OTR_HOST}"
OTR_PORT = ${OTR_PORT}
OTR_USER = "${OTR_USER}"
OTR_PASS = "${OTR_PASS}"
OTR_CLIENTID = "owntracks-recorder"
EOF

If you’re using TLS on your MQTT server, then copy over the CA (for example, /etc/pki/tls/certs/ca-bundle.crt) and set the OTR_CAFILE config option to point to the file as it will be inside the container. This will automatically enable TLS connection to your MQTT server.

sudo cp /etc/pki/tls/certs/ca-bundle.crt /var/lib/owntracks/config/ca.crt
echo 'OTR_CAFILE="/config/ca.crt"' | sudo tee -a /var/lib/owntracks/config/recorder.conf

Next get the Lua scripts ready which will allow recorder to forward HTTP events on to MQTT. We will write a file called hook.lua to run the script, which is referenced in the config above. It has a JSON dependency, which we will download from the Internet.

wget http://regex.info/code/JSON.lua
sudo mv JSON.lua /var/lib/owntracks/scripts/JSON.lua
cat << EOF | sudo tee /var/lib/owntracks/scripts/hook.lua
JSON = (loadfile "/scripts/JSON.lua")()

function otr_init()
end

function otr_exit()
end

function otr_hook(topic, _type, data)
    otr.log("DEBUG_PUB:" .. topic .. " " .. JSON:encode(data))
    if(data['_http'] == true) then
        if(data['_repub'] == true) then
           return
        end
        data['_repub'] = true
        local payload = JSON:encode(data)
        otr.publish(topic, payload, 1, 1)
    end
end

function otr_putrec(u, d, s)
        j = JSON:decode(s)
        if (j['_repub'] == true) then
                return 1
        end
end
EOF

Next we can run the container for recorder. We will map in all of the directories we created earlier and the configuration we created should be read in when the program in the container starts. Note that :Z option sets the SELinux context on those config files.

docker run -dit --name recorder \
--restart always \
-p 8083:8083 \
-v /var/lib/owntracks/store:/store:Z \
-v /var/lib/owntracks/config:/config:Z \
-v /var/lib/owntracks/scripts:/scripts:Z \
-v /etc/localtime:/etc/localtime:ro \
owntracks/recorder

OwnTracks should now be listening on port 8083, waiting for connections to come in through nginx!

Friends with OwnTracks

To set up friends in HTTP mode we need to get a shell on the container and load friends data into the database.

docker exec -it recorder /bin/sh

Inside the container we load friends data into the database. Let’s use our three friends as an example, Alice with her phone pixel3xl, Bob with his pixel4 and Charlie with her pixel3a, to set up notifications for everyone.

ocat --load=friends << EOF
alice-pixel3xl [ "bob/pixel4", "charlie/pixel3a" ]
bob-pixel4 [ "alice/pixel3xl, "charlie/pixel3a" ]
charlie-pixel3a [ "alice/pixel3xl, "bob/pixel4" ]
EOF

We can dump the friends data to see what we’ve loaded, then exit the container.

ocat -S /store --dump=friends
exit

Now, whenever Alice, Bob or Charlie update their location, recorder will return JSON data with the location of the other two. OwnTracks will then display that information under the Friends tab. Unfortunately, the one thing thing HTTP mode doesn’t support is Regions notifications to be notified when friends enter or leave defined way points, but I’ve found OwnTracks to be much more reliable with HTTP so I guess that’s a small price to pay…


December 14, 2019

Audiobooks – November 2019

Exactly: How Precision Engineers Created the Modern World by Simon Winchester

Starting from the early 18th century each chapter covers increasing greater accuracy and the technology that needed and used it. Nice read 8/10

The Secret Cyclist: Real Life as a Rider in the Professional Peloton by The Secret Cyclist

An okay read although I don’t follow the sport so had never heard of most of the names. It is still readable however and gives a good feel for the world. 6/10

Braving It: A Father, a Daughter, and an Unforgettable Journey into the Alaskan Wild by James Campbell

A father takes his 15 year-old daughter for two trips to a remote cabin and a 3rd trip hiking/canoeing along a remote river in Alaska. Well written and interesting. 8/10

The Left Behind: Decline and Rage in Rural America by Robert Wuthnow

Based on Interviews with small town Americans it talks about their lives and frustrations with Washington which they see as distant but interfering. 7/10

World War Z: An Oral History of the Zombie War by Max Brookes

This was the “almost” full text version. Lots of different actors reading each chapter (which are arranged as interviews). Great story and presentation works well. 9/10

Share

Booting temporary firmware on the Raptor Blackbird

In a future post, I’ll detail how to build my ported-to-upstream Blackbird firmware. Here though, we’ll explore booting some firmware temporarily to experiment.

Step 1: Copy your new PNOR image over to the BMC.
Step 2: …
Step 3: Profit!

Okay, not really, once you’ve copied over your image, ensure the computer is off and then you can tell the daemon that provides firmware to the host to use a file backend for it rather than the PNOR chip on the motherboard (i.e. yes, you can boot your system even when the firmware chip isn’t there – although I’ve not literally tried this).

root@blackbird:~# mboxctl --backend file:/tmp/blackbird.pnor 
SetBackend: Success
root@blackbird:~# obmcutil poweron

If we look at the serial console (ssh to the BMC port 2200) we’ll see Hostboot start, realise there’s newer SBE code, flash it, and reboot:

--== Welcome to Hostboot hostboot-b284071/hbicore.bin ==--

  3.02606|secure|SecureROM valid - enabling functionality
  5.14678|Booting from SBE side 0 on master proc=00050000
  5.18537|ISTEP  6. 5 - host_init_fsi
  5.47985|ISTEP  6. 6 - host_set_ipl_parms
  5.54476|ISTEP  6. 7 - host_discover_targets
  6.56106|HWAS|PRESENT> DIMM[03]=8080000000000000
  6.56108|HWAS|PRESENT> Proc[05]=8000000000000000
  6.56109|HWAS|PRESENT> Core[07]=1511540000000000
  6.61373|ISTEP  6. 8 - host_update_master_tpm
  6.61529|SECURE|Security Access Bit> 0x0000000000000000
  6.61530|SECURE|Secure Mode Disable (via Jumper)> 0x8000000000000000
  6.61543|ISTEP  6. 9 - host_gard
  7.20987|HWAS|FUNCTIONAL> DIMM[03]=8080000000000000
  7.20988|HWAS|FUNCTIONAL> Proc[05]=8000000000000000
  7.20989|HWAS|FUNCTIONAL> Core[07]=1511540000000000
  7.21299|ISTEP  6.11 - host_start_occ_xstop_handler
  8.28965|ISTEP  6.12 - host_voltage_config
  8.47973|ISTEP  7. 1 - mss_attr_cleanup
  9.07674|ISTEP  7. 2 - mss_volt
  9.35627|ISTEP  7. 3 - mss_freq
  9.63029|ISTEP  7. 4 - mss_eff_config
 10.35189|ISTEP  7. 5 - mss_attr_update
 10.38489|ISTEP  8. 1 - host_slave_sbe_config
 10.45332|ISTEP  8. 2 - host_setup_sbe
 10.45450|ISTEP  8. 3 - host_cbs_start
 10.45574|ISTEP  8. 4 - proc_check_slave_sbe_seeprom_complete
 10.48675|ISTEP  8. 5 - host_attnlisten_proc
 10.50338|ISTEP  8. 6 - host_p9_fbc_eff_config
 10.50771|ISTEP  8. 7 - host_p9_eff_config_links
 10.53338|ISTEP  8. 8 - proc_attr_update
 10.53634|ISTEP  8. 9 - proc_chiplet_fabric_scominit
 10.55234|ISTEP  8.10 - proc_xbus_scominit
 10.56202|ISTEP  8.11 - proc_xbus_enable_ridi
 10.57788|ISTEP  8.12 - host_set_voltages
 10.59421|ISTEP  9. 1 - fabric_erepair
 10.65877|ISTEP  9. 2 - fabric_io_dccal
 10.66048|ISTEP  9. 3 - fabric_pre_trainadv
 10.66665|ISTEP  9. 4 - fabric_io_run_training
 10.66860|ISTEP  9. 5 - fabric_post_trainadv
 10.67060|ISTEP  9. 6 - proc_smp_link_layer
 10.67503|ISTEP  9. 7 - proc_fab_iovalid
 11.10386|ISTEP  9. 8 - host_fbc_eff_config_aggregate
 11.15103|ISTEP 10. 1 - proc_build_smp
 11.27537|ISTEP 10. 2 - host_slave_sbe_update
 11.68581|sbe|System Performing SBE Update for PROC 0, side 0
 34.50467|sbe|System Rebooting To Complete SBE Update Process
 34.50595|IPMI: Initiate power cycle
 34.54671|Stopping istep dispatcher
 34.68729|IPMI: shutdown complete

One of the improvements is we now get output from the SBE! This means that when we do things like mess up secure boot and non secure boot firmware (I’ll explain why/how this is a thing later), we’ll actually get something useful out of a serial port:

--== Welcome to SBE - CommitId[0x8b06b5c1] ==--
istep 3.19
istep 3.20
istep 3.21
istep 3.22
istep 4.1
istep 4.2
istep 4.3
istep 4.4
istep 4.5
istep 4.6
istep 4.7
istep 4.8
istep 4.9
istep 4.10
istep 4.11
istep 4.12
istep 4.13
istep 4.14
istep 4.15
istep 4.16
istep 4.17
istep 4.18
istep 4.19
istep 4.20
istep 4.21
istep 4.22
istep 4.23
istep 4.24
istep 4.25
istep 4.26
istep 4.27
istep 4.28
istep 4.29
istep 4.30
istep 4.31
istep 4.32
istep 4.33
istep 4.34
istep 5.1
istep 5.2
SBE starting hostboot

And then we’re back into normal Hostboot boot (which we’ve all seen before) and end up at a newer petitboot!

Petitboot 1.11 on a Raptor Blackbird

One notable absence from that screenshot is my installed Fedora is missing. This is because there appears to be a bug in the 5.3.7 kernel that’s currently upstream, and if we drop to the shell and poke at lspci and dmesg, we can work out what could be the culprit:

Exiting petitboot. Type 'exit' to return.
You may run 'pb-sos' to gather diagnostic data
No password set, running as root. You may set a password in the System Configuration screen.
# lspci
0000:00:00.0 PCI bridge: IBM Device 04c1
0001:00:00.0 PCI bridge: IBM Device 04c1
0001:01:00.0 Non-Volatile memory controller: Intel Corporation Device f1a8 (rev 03)
0002:00:00.0 PCI bridge: IBM Device 04c1
0002:01:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9235 PCIe 2.0 x2 4-port SATA 6 Gb/s Controller (rev 11)
0003:00:00.0 PCI bridge: IBM Device 04c1
0003:01:00.0 USB controller: Texas Instruments TUSB73x0 SuperSpeed USB 3.0 xHCI Host Controller (rev 02)
0004:00:00.0 PCI bridge: IBM Device 04c1
0004:01:00.0 Ethernet controller: Broadcom Limited NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
0004:01:00.1 Ethernet controller: Broadcom Limited NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
0004:01:00.2 Ethernet controller: Broadcom Limited NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
0005:00:00.0 PCI bridge: IBM Device 04c1
0005:01:00.0 PCI bridge: ASPEED Technology, Inc. AST1150 PCI-to-PCI Bridge (rev 04)
0005:02:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 41)
# dmesg|grep -i nvme
[    2.991038] nvme nvme0: pci function 0001:01:00.0
[    2.991088] nvme 0001:01:00.0: enabling device (0140 -> 0142)
[    3.121799] nvme nvme0: Identify Controller failed (19)
[    3.121802] nvme nvme0: Removing after probe failure status: -5
# uname -a
Linux skiroot 5.3.7-openpower1 #2 SMP Sat Dec 14 09:06:20 PST 2019 ppc64le GNU/Linux

If for some reason the device didn’t show up in lspci, then I’d look at the skiboot firmware log, which is /sys/firmware/opal/msglog.

Looking at upstream stable kernel patches, it seems like 5.3.8 has a interesting looking patch when you realize that ppc64le uses a 64k page size:

commit efac0f186ea654e8389f5017c7f643ef48cb4b93
Author: Kevin Hao <haokexin@gmail.com>
Date:   Fri Oct 18 10:53:14 2019 +0800

    nvme-pci: Set the prp2 correctly when using more than 4k page
    
    commit a4f40484e7f1dff56bb9f286cc59ffa36e0259eb upstream.
    
    In the current code, the nvme is using a fixed 4k PRP entry size,
    but if the kernel use a page size which is more than 4k, we should
    consider the situation that the bv_offset may be larger than the
    dev->ctrl.page_size. Otherwise we may miss setting the prp2 and then
    cause the command can't be executed correctly.
    
    Fixes: dff824b2aadb ("nvme-pci: optimize mapping of small single segment requests")
    Cc: stable@vger.kernel.org
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Kevin Hao <haokexin@gmail.com>
    Signed-off-by: Keith Busch <kbusch@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

So, time to go try 5.3.8. My yaks are getting quite smooth.

Oh, and when you’re done with your temporary firmware, either fiddle with mboxctl or restart the systemd service for it, or reboot your BMC or… well, I gotta leave you something to work out on your own :)

Building OpenPOWER firmware on Fedora 31

One of the challenges with Fedora 31 is that /usr/bin/python is now Python 3 rather than Python 2. Just about every python script in existence relies on /usr/bin/python being Python 2 and not anything else. I can’t really recall, but this probably happened with the 1.5 to 2 transition as well (although IIRC that was less breaking).

What this means is that for projects that are half-way through converting to python 3, everything breaks.

op-build is one of these projects.

So, we need:

After all that, you can actually build a pnor image on Fedora 31. Even on Fedora 31 ppc64le, which is literally what I’ve just done.

December 13, 2019

Upstreaming Blackbird firmware (step 1: skiboot)

Now that I can actually boot the machine, I could test and send my patch upstream for Blackbird support in skiboot. One thing I noticed with the current firmware from Raptor is that the PCIe slot names were wrong. While a pretty minor point, it’s a bit funny that there’s only two slots and the names were wrong.

The PCIe slot names are used to call out the physical location of PCIe cards in the system, so if you, say, hit a bunch of errors, OS/firmware can say “It’s this card in the slot labeled BLAH on the board”.

With my patch, the slot table from skiboot is spat out looking like this:

[   64.296743001,5] PHB#0000:00:00.0 [ROOT] 1014 04c1 R:00 C:060400 B:01..ff SLOT=SLOT1 PCIE 4.0 X16 
 [   64.296875483,5] PHB#0001:00:00.0 [ROOT] 1014 04c1 R:00 C:060400 B:01..01 SLOT=SLOT2 PCIE 4.0 X8 
 [   64.297054197,5] PHB#0001:01:00.0 [EP  ] 8086 f1a8 R:03 C:010802 (  mass-storage) LOC_CODE=SLOT2 PCIE 4.0 X8
 [   64.297285067,5] PHB#0002:00:00.0 [ROOT] 1014 04c1 R:00 C:060400 B:01..01 SLOT=Builtin SATA 
 [   64.297411565,5] PHB#0002:01:00.0 [LGCY] 1b4b 9235 R:11 C:010601 (          sata) LOC_CODE=Builtin SATA
 [   64.297554540,5] PHB#0003:00:00.0 [ROOT] 1014 04c1 R:00 C:060400 B:01..01 SLOT=Builtin USB 
 [   64.297732049,5] PHB#0003:01:00.0 [EP  ] 104c 8241 R:02 C:0c0330 (      usb-xhci) LOC_CODE=Builtin USB
 [   64.297848624,5] PHB#0004:00:00.0 [ROOT] 1014 04c1 R:00 C:060400 B:01..01 SLOT=Builtin Ethernet 
 [   64.298026870,5] PHB#0004:01:00.0 [EP  ] 14e4 1657 R:01 C:020000 (      ethernet) LOC_CODE=Builtin Ethernet
 [   64.298212291,5] PHB#0004:01:00.1 [EP  ] 14e4 1657 R:01 C:020000 (      ethernet) LOC_CODE=Builtin Ethernet
 [   64.298424962,5] PHB#0004:01:00.2 [EP  ] 14e4 1657 R:01 C:020000 (      ethernet) LOC_CODE=Builtin Ethernet
 [   64.298587848,5] PHB#0005:00:00.0 [ROOT] 1014 04c1 R:00 C:060400 B:01..02 SLOT=BMC 
 [   64.298722540,5] PHB#0005:01:00.0 [ETOX] 1a03 1150 R:04 C:060400 B:02..02 LOC_CODE=BMC
 [   64.298850009,5] PHB#0005:02:00.0 [PCID] 1a03 2000 R:41 C:030000 (           vga) LOC_CODE=BMC

If you want to give it a go, grab the patch, build skiboot, and flash it on. Alternatively, you can download a built skiboot here. To flash it, do this:

# Copy to your BMC for the Blackbird
scp skiboot-v6.5-146-g376bed3f.lid.xz.stb root@blackbird:/tmp/

# then, ssh to the BMC
$ ssh root@blackbird

# ensure the machine is off
obmcutil poweroff --wait

# Now, make a backup copy (remember to copy it off /tmp on the bmc)
pflash -P PAYLOAD -r /tmp/skiboot-backup

# and flash the new skiboot:
pflash -e -P PAYLOAD -p /tmp/skiboot.lid.xz.stb

# now, power on the box
obmcutil poweron

Black(bird) boots!

Well, after the half false start of not having RAM so really not being able to do much (yeah yeah, I hear you – I’m weak for not just running Linux in L3), my RAM arrived today. Putting the sticks in was easy (of course), although does not make for an exciting photo.

One DIMM in the Blackbird

After that, I SSH’d the the BMC and then did “obmcutil poweron” (as is traditional) and started looking at the console via conneting via SSH to port 2200 on the BMC. I was then greeted by the (by this time in my life rather familiar) Hostboot:

--== Welcome to Hostboot hostboot-3beba24/hbicore.bin ==--
 3.02902|secure|SecureROM valid - enabling functionality
   7.15613|Booting from SBE side 0 on master proc=00050000
   7.19697|ISTEP  6. 5 - host_init_fsi
   7.54226|ISTEP  6. 6 - host_set_ipl_parms
   8.06280|ISTEP  6. 7 - host_discover_targets
   9.19791|HWAS|PRESENT> DIMM[03]=8080000000000000
   9.19792|HWAS|PRESENT> Proc[05]=8000000000000000
   9.19794|HWAS|PRESENT> Core[07]=1511540000000000
   9.55305|ISTEP  6. 8 - host_update_master_tpm
   9.60521|SECURE|Security Access Bit> 0x0000000000000000
   9.60522|SECURE|Secure Mode Disable (via Jumper)> 0x8000000000000000
   9.63093|ISTEP  6. 9 - host_gard
   9.89867|HWAS|Blocking Speculative Deconfig
   9.90128|HWAS|FUNCTIONAL> DIMM[03]=8080000000000000
   9.90129|HWAS|FUNCTIONAL> Proc[05]=8000000000000000
   9.90130|HWAS|FUNCTIONAL> Core[07]=1511540000000000
   9.90329|ISTEP  6.11 - host_start_occ_xstop_handler
  11.19092|ISTEP  6.12 - host_voltage_config
  11.30246|ISTEP  7. 1 - mss_attr_cleanup
  12.61924|ISTEP  7. 2 - mss_volt
  12.92705|ISTEP  7. 3 - mss_freq
  13.67475|ISTEP  7. 4 - mss_eff_config
  14.95827|ISTEP  7. 5 - mss_attr_update
  14.97307|ISTEP  8. 1 - host_slave_sbe_config
  15.05372|ISTEP  8. 2 - host_setup_sbe
  15.10258|ISTEP  8. 3 - host_cbs_start
  15.10381|ISTEP  8. 4 - proc_check_slave_sbe_seeprom_complete
  15.11144|ISTEP  8. 5 - host_attnlisten_proc
  15.11213|ISTEP  8. 6 - host_p9_fbc_eff_config
  15.13552|ISTEP  8. 7 - host_p9_eff_config_links
  15.20087|ISTEP  8. 8 - proc_attr_update
  15.20191|ISTEP  8. 9 - proc_chiplet_fabric_scominit
  15.21891|ISTEP  8.10 - proc_xbus_scominit
  15.22929|ISTEP  8.11 - proc_xbus_enable_ridi
  15.24717|ISTEP  8.12 - host_set_voltages
  15.26620|ISTEP  9. 1 - fabric_erepair
  15.42123|ISTEP  9. 2 - fabric_io_dccal
  15.42436|ISTEP  9. 3 - fabric_pre_trainadv
  15.42887|ISTEP  9. 4 - fabric_io_run_training
  15.43207|ISTEP  9. 5 - fabric_post_trainadv
  15.44893|ISTEP  9. 6 - proc_smp_link_layer
  15.45454|ISTEP  9. 7 - proc_fab_iovalid
  15.87126|ISTEP  9. 8 - host_fbc_eff_config_aggregate
  15.89174|ISTEP 10. 1 - proc_build_smp
  16.54194|ISTEP 10. 2 - host_slave_sbe_update
  18.63876|sbe|System Performing SBE Update for PROC 0, side 0
  41.69727|sbe|System Rebooting To Complete SBE Update Process
  41.72189|IPMI: Initiate power cycle
  42.40652|IPMI: shutdown complete

The first IPL updated the Self Boot Engine firmware on the chip, so it automatically applied the new firmware and rebooted to finish applying it. This is perfectly normal, it just shows itself as a longer boot time. Booting continues:

--== Welcome to Hostboot hostboot-3beba24/hbicore.bin ==--
 3.02810|secure|SecureROM valid - enabling functionality
   6.07331|Booting from SBE side 0 on master proc=00050000
   6.11485|ISTEP  6. 5 - host_init_fsi
   6.60361|ISTEP  6. 6 - host_set_ipl_parms
   6.98640|ISTEP  6. 7 - host_discover_targets
   7.53975|HWAS|PRESENT> DIMM[03]=8080000000000000
   7.53976|HWAS|PRESENT> Proc[05]=8000000000000000
   7.53977|HWAS|PRESENT> Core[07]=1511540000000000
   7.79123|ISTEP  6. 8 - host_update_master_tpm
   7.79263|SECURE|Security Access Bit> 0x0000000000000000
   7.79264|SECURE|Secure Mode Disable (via Jumper)> 0x8000000000000000
   7.82684|ISTEP  6. 9 - host_gard
   8.26609|HWAS|Blocking Speculative Deconfig
   8.26865|HWAS|FUNCTIONAL> DIMM[03]=8080000000000000
   8.26866|HWAS|FUNCTIONAL> Proc[05]=8000000000000000
   8.26867|HWAS|FUNCTIONAL> Core[07]=1511540000000000
   8.27142|ISTEP  6.11 - host_start_occ_xstop_handler
   9.69606|ISTEP  6.12 - host_voltage_config
   9.81183|ISTEP  7. 1 - mss_attr_cleanup
  10.95130|ISTEP  7. 2 - mss_volt
  11.39875|ISTEP  7. 3 - mss_freq
  12.15655|ISTEP  7. 4 - mss_eff_config
  13.63504|ISTEP  7. 5 - mss_attr_update
  13.65162|ISTEP  8. 1 - host_slave_sbe_config
  13.78039|ISTEP  8. 2 - host_setup_sbe
  13.78143|ISTEP  8. 3 - host_cbs_start
  13.78247|ISTEP  8. 4 - proc_check_slave_sbe_seeprom_complete
  13.79015|ISTEP  8. 5 - host_attnlisten_proc
  13.79114|ISTEP  8. 6 - host_p9_fbc_eff_config
  13.79734|ISTEP  8. 7 - host_p9_eff_config_links
  13.85128|ISTEP  8. 8 - proc_attr_update
  13.85783|ISTEP  8. 9 - proc_chiplet_fabric_scominit
  13.87991|ISTEP  8.10 - proc_xbus_scominit
  13.89056|ISTEP  8.11 - proc_xbus_enable_ridi
  13.91122|ISTEP  8.12 - host_set_voltages
  13.93077|ISTEP  9. 1 - fabric_erepair
  14.05235|ISTEP  9. 2 - fabric_io_dccal
  14.13131|ISTEP  9. 3 - fabric_pre_trainadv
  14.13616|ISTEP  9. 4 - fabric_io_run_training
  14.13934|ISTEP  9. 5 - fabric_post_trainadv
  14.14087|ISTEP  9. 6 - proc_smp_link_layer
  14.14656|ISTEP  9. 7 - proc_fab_iovalid
  14.59454|ISTEP  9. 8 - host_fbc_eff_config_aggregate
  14.61811|ISTEP 10. 1 - proc_build_smp
  15.24074|ISTEP 10. 2 - host_slave_sbe_update
  17.16022|sbe|System Performing SBE Update for PROC 0, side 1
  40.16808|ISTEP 10. 4 - proc_cen_ref_clk_enable
  40.27866|ISTEP 10. 5 - proc_enable_osclite
  40.31297|ISTEP 10. 6 - proc_chiplet_scominit
  40.55805|ISTEP 10. 7 - proc_abus_scominit
  40.57942|ISTEP 10. 8 - proc_obus_scominit
  40.58078|ISTEP 10. 9 - proc_npu_scominit
  40.60704|ISTEP 10.10 - proc_pcie_scominit
  40.66572|ISTEP 10.11 - proc_scomoverride_chiplets
  40.66874|ISTEP 10.12 - proc_chiplet_enable_ridi
  40.68407|ISTEP 10.13 - host_rng_bist
  40.75548|ISTEP 10.14 - host_update_redundant_tpm
  40.75785|ISTEP 11. 1 - host_prd_hwreconfig
  41.15067|ISTEP 11. 2 - cen_tp_chiplet_init1
  41.15299|ISTEP 11. 3 - cen_pll_initf
  41.15544|ISTEP 11. 4 - cen_pll_setup
  41.18530|ISTEP 11. 5 - cen_tp_chiplet_init2
  41.18762|ISTEP 11. 6 - cen_tp_arrayinit
  41.19050|ISTEP 11. 7 - cen_tp_chiplet_init3
  41.19286|ISTEP 11. 8 - cen_chiplet_init
  41.19553|ISTEP 11. 9 - cen_arrayinit
  41.19986|ISTEP 11.10 - cen_initf
  41.20215|ISTEP 11.11 - cen_do_manual_inits
  41.20497|ISTEP 11.12 - cen_startclocks
  41.20802|ISTEP 11.13 - cen_scominits
  41.21171|ISTEP 12. 1 - mss_getecid
  42.25709|ISTEP 12. 2 - dmi_attr_update
  42.30382|ISTEP 12. 3 - proc_dmi_scominit
  42.32572|ISTEP 12. 4 - cen_dmi_scominit
  42.32798|ISTEP 12. 5 - dmi_erepair
  42.35000|ISTEP 12. 6 - dmi_io_dccal
  42.35218|ISTEP 12. 7 - dmi_pre_trainadv
  42.35489|ISTEP 12. 8 - dmi_io_run_training
  42.37076|ISTEP 12. 9 - dmi_post_trainadv
  42.39541|ISTEP 12.10 - proc_cen_framelock
  42.40772|ISTEP 12.11 - host_startprd_dmi
  42.41974|ISTEP 12.12 - host_attnlisten_memb
  42.44506|ISTEP 12.13 - cen_set_inband_addr
  42.58832|ISTEP 13. 1 - host_disable_memvolt
  43.67808|ISTEP 13. 2 - mem_pll_reset
  43.75070|ISTEP 13. 3 - mem_pll_initf
  43.85043|ISTEP 13. 4 - mem_pll_setup
  43.87372|ISTEP 13. 6 - mem_startclocks
  43.88970|ISTEP 13. 7 - host_enable_memvolt
  43.89177|ISTEP 13. 8 - mss_scominit
  45.10013|ISTEP 13. 9 - mss_ddr_phy_reset
  45.38105|ISTEP 13.10 - mss_draminit
  45.95447|ISTEP 13.11 - mss_draminit_training
  47.20963|ISTEP 13.12 - mss_draminit_trainadv
  47.32161|ISTEP 13.13 - mss_draminit_mc
  47.49186|ISTEP 14. 1 - mss_memdiag
  69.53224|ISTEP 14. 2 - mss_thermal_init
  69.66891|ISTEP 14. 3 - proc_pcie_config
  69.71959|ISTEP 14. 4 - mss_power_cleanup
  69.72385|ISTEP 14. 5 - proc_setup_bars
  69.83889|ISTEP 14. 6 - proc_htm_setup
  69.84748|ISTEP 14. 7 - proc_exit_cache_contained
  69.89430|ISTEP 15. 1 - host_build_stop_image
  73.08679|ISTEP 15. 2 - proc_set_pba_homer_bar
  73.12352|ISTEP 15. 3 - host_establish_ex_chiplet
  73.13714|ISTEP 15. 4 - host_start_stop_engine
  73.19059|ISTEP 16. 1 - host_activate_master
  74.44590|ISTEP 16. 2 - host_activate_slave_cores
  74.53820|ISTEP 16. 3 - host_secure_rng
  74.54651|ISTEP 16. 4 - mss_scrub
  74.56565|ISTEP 16. 5 - host_load_io_ppe
  74.78752|ISTEP 16. 6 - host_ipl_complete
  75.50085|ISTEP 18.11 - proc_tod_setup
  75.94190|ISTEP 18.12 - proc_tod_init
  75.97575|ISTEP 20. 1 - host_load_payload
  77.12340|ISTEP 20. 2 - host_load_hdat
  78.05195|ISTEP 21. 1 - host_runtime_setup
  83.87001|htmgt|OCCs are now running in ACTIVE state
  89.72649|ISTEP 21. 2 - host_verify_hdat
  89.77252|ISTEP 21. 3 - host_start_payload
 [   90.400516933,5] OPAL skiboot-c81f9d6 starting…

The rest of the skiboot log was also spat out, and then the familiar Petitboot screen:

Welcome to Petitboot!

It lives! I even had a bit of a look at the sensors to see power consumption and temperatures. All looks good:

ipmitool sdr|grep -v ns
 occ0             | 0x00              | ok
 occ1             | 0x00              | ok
 p0_core3_temp    | 51 degrees C      | ok
 p0_core5_temp    | 49 degrees C      | ok
 p0_core7_temp    | 50 degrees C      | ok
 p0_core11_temp   | 49 degrees C      | ok
 p0_core15_temp   | 50 degrees C      | ok
 p0_core17_temp   | 50 degrees C      | ok
 p0_core19_temp   | 50 degrees C      | ok
 p0_core21_temp   | 50 degrees C      | ok
 dimm0_temp       | 36 degrees C      | ok
 dimm4_temp       | 39 degrees C      | ok
 fan0             | 1300 RPM          | ok
 fan1             | 1200 RPM          | ok
 fan2             | 1000 RPM          | ok
 p0_power         | 60 Watts          | ok
 p0_vdd_power     | 31 Watts          | ok
 p0_vdn_power     | 10 Watts          | ok
 cpu_1_ambient    | 30.90 degrees C   | ok
 pcie             | 27 degrees C      | ok
 ambient          | 25.40 degrees C   | ok

Next up? I guess I should install an OS.

Coming to grips with Kubernetes in 2020: podcasts

Share

It has become clear to me that it is time to care about Kubernetes more. I’m sure many people have cared for ages, but the things I want to build at the moment are starting to be more container based now that I am thinking more at the application layer than the cloud infrastructure layer. So how to do that? I thought I’d write down some notes on what has worked (or not) for me, in the hope it will help others. In this post, podcasts.

I thought podcasts would be an interesting way to get started with some nice overviews. This is especially true because I’m already a pretty heavy podcast user, so it was easy to slot into my existing routine. Unfortunately this hasn’t really worked out. I started with the podctl podcast, but they only ever talk about Red Hat stuff. It is very rare for a guest to not be a Red Hat employee for example. The presenters of this podcast seem to also really dislike OpenStack for reasons they never explain, which is annoying.

Then I figured maybe the Google Kubernetes podcast would be better, but it often lacks the depth I am interested in.

I am yet to find a good podcast which deep dives into technology instead of just talking about what is in the latest release. So maybe these podcasts are useful if you’re interested in what things dropped in the most recent release, but they’re not a good nor systematic way to get introduced to Kubernetes.

That said, I only just discovered the TGI Kubernetes youtube channel yesterday. It is not really what I wanted in a podcast given its a video blog, but I think it has prospects to be interesting. I will update this post when I’ve had a chance to check it out in more depth.

Have you found a good Kubernetes podcast? Am I being wildly unfair?

Share

December 12, 2019

Audiobooks – October 2019

The Story of the British Isles in 100 Places by Neil Oliver

Covers what you’d expect with a good attempt not just to hit the “history 101” places. Author has an accent that takes a while to get used to. 7/10

Death’s End – Cixin Liu

3rd in Trilogy wrapping things mostly up. Just a few characters so easy to keep track of them. If you liked the previous books you’ll like this one. 7/10

Building the Cycling City: The Dutch Blueprint for Urban Vitality by Melissa & Chris Bruntlett

Talking about Dutch Cycling culture. Compares 5 different cities (some car orientated) and how they differ in their cycling journey. 7/10

Scrappy Little Nobody by Anna Kendrick

A general memoir by the actress. A bit disjointed & unsystematic and by no means a tell-all. A few good stories sprinkled in. 6/10

The $100 Startup by Chris Guillebeau

Lots of case studies of businesses built off relatively little capital (and usually staying small). Plenty of good advice although lists don’t translate well in audio. 7/10

Atomic Adventures: Secret Islands, Forgotten N-Rays, and Isotopic Murder-A Journey into the Wild World of Nuclear Science by James Magaffey

A bunch of really good stories from the Atomic age (not just the usual ones) including a view from inside of the Cold Fusion fiasco. 8/10

Share

December 10, 2019

Looking at the state of Blackbird firmware

Having been somewhat involved in OpenPOWER firmware, I have a bunch of experience and opinions on maintaining firmware trees for products, what working with upstream looks like and all that.

So, with my new Blackbird system I decided to take a bit of a look as to what the firmware situation was like.

There’s two main parts of firmware: BMC and Host. The BMC firmware runs purely on the ASPEED AST2500 and is based on OpenBMC while the host firmware is what runs on the POWER9 and is based off of OpenPOWER Firmware as assembled by op-build.

Initial impressions on the BMC is that there doesn’t seem to be any web based UI for it, which is kind of disappointing, as the Web UI being developed upstream has some nice qualities, and I’d say I even enjoyed using it when it was built into BMC firmware for systems we had when I was at IBM.

Looking at the git trees, the raptor-v1.00 tag is OpenBMC 2.7.0-dev-533-g386e5602e while current master is 2.8.0-dev-960-g10f7830bd. The spot where it split off was 2.7.0-dev-430-g7443ee80b, from April 2019 – so it’s not too old, but I’m also not convinced there should have been some security patches since then.

I’m not sure if any of the OpenBMC code is upstream, I haven’t looked.

Unfortunately, none of the host firmware is upstream.

On the host firmware side, v2.3-rc2-67-ga6a5f142 is the Raptor tag, and that compares with current master of v2.4-305-g54d8daf4, the place where Raptor forked was v2.3-rc2-9-g7b556015, again in April of 2019. Considering there was an upstream release in May of 2019 (v2.3), and again in July (v2.4), it could have easily have made it into an upstream release.

Unfortunately, there doesn’t seem to have been an upstream op-build release since v2.4 back in July (when I made it shortly before leaving IBM).

The skiboot component of host firmware has had an upstream release since I left (v6.5 in mid-August 2019), so the (rather trivial) platform support could have easily made it. I have a cleaned up and ready to upstream patch for it, I just need some DIMMs to actually test with before I send the patch.

As the current firmware situation stands, producing another build with updated upstream code is tricky due to the out-of-tree nature of the Blackbird patches, and a straight “git merge” is probably doable by some people, but not everybody.

On my TODO list is to get all the code into a state I can upstream it, assess vulnerability to CVE-2019-6260, and work out how I want to make it do Secure Boot (something that isn’t in upstream firmware yet, and currently would require a TPM, which I do not have).

Blackbird (singing in the dead of night..)

Way back when Raptor Computer Systems was doing pre-orders for the microATX Blackboard POWER9 system, I put in a pre-order. Since then, I’ve had a few life changes (such as moving to the US and starting to work for Amazon rather than IBM), but I’ve finally gone and done (most of) the setup for my own POWER9 system on (or under) my desk.

An 8 core POWER9 CPU, in bubble wrap and plastic packaging.

Everything came in a big brown box, all rather well packed. I had the board, CPU, heatsink assembly and the special tool to attach the heatsink to the board. Although unique to POWER9, the heatsink/fan assembly was one of the easier ones I’ve ever attached to a board.

The board itself looks pretty much as you’d expect – there’s a big spot for the CPU, a couple of PCI slots, a couple of DIMM slots and some SATA connectors.

The bits that are a bit unusual for a micro-ATX board are the big space reserved for FlexVer, the ASPEED BMC chip and the socketed flash. FlexVer is something I’m not ever going to use, and instead wish that there was an on-board m2 SSD slot instead, even if it was just PCIe. Having to sacrifice a PCIe slot just for a SSD is kind of a bummer.

The Blackbird POWER9 board
The POWER9 chip in socket

One annoying thing is my DIMMs are taking their sweet time in getting here, so I couldn’t actually populate the board with any memory.

Even without memory though, you can start powering it on and see that everything else works okay (i.e. it’s not completely boned). So, even without DIMMs, I could plug it in, and observe the Hostboot firmware complaining about insufficient hardware to IPL the box.

It Lives!

Yep, out the console (via ssh) you clearly see where things fail:

--== Welcome to Hostboot hostboot-3beba24/hbicore.bin ==--

  3.03104|secure|SecureROM valid - enabling functionality
  6.67619|Booting from SBE side 0 on master proc=00050000
  6.85100|ISTEP  6. 5 - host_init_fsi
  7.23753|ISTEP  6. 6 - host_set_ipl_parms
  7.71759|ISTEP  6. 7 - host_discover_targets
 11.34738|HWAS|PRESENT> Proc[05]=8000000000000000
 11.34739|HWAS|PRESENT> Core[07]=1511540000000000
 11.69077|ISTEP  6. 8 - host_update_master_tpm
 11.73787|SECURE|Security Access Bit> 0x0000000000000000
 11.73787|SECURE|Secure Mode Disable (via Jumper)> 0x8000000000000000
 11.76276|ISTEP  6. 9 - host_gard
 11.96654|HWAS|FUNCTIONAL> Proc[05]=8000000000000000
 11.96655|HWAS|FUNCTIONAL> Core[07]=1511540000000000
 12.07554|================================================
 12.07554|Error reported by hwas (0x0C00) PLID 0x90000007
 12.10289|  checkMinimumHardware found no functional dimm cards.
 12.10290|  ModuleId   0x03 MOD_CHECK_MIN_HW
 12.10291|  ReasonCode 0x0c06 RC_SYSAVAIL_NO_MEMORY_FUNC
 12.10292|  UserData1  HUID of node : 0x0002000000000000
 12.10293|  UserData2  number of present, non-functional dimms : 0x0000000000000000
 12.10294|------------------------------------------------
 12.10417|  Callout type             : Procedure Callout
 12.10417|  Procedure                : EPUB_PRC_FIND_DECONFIGURED_PART
 12.10418|  Priority                 : SRCI_PRIORITY_HIGH
 12.10419|------------------------------------------------
 12.10420|  Hostboot Build ID: hostboot-3beba24/hbicore.bin
 12.10421|================================================
 12.51718|================================================
 12.51719|Error reported by hwas (0x0C00) PLID 0x90000007
 12.51720|  Insufficient hardware to continue.
 12.51721|  ModuleId   0x03 MOD_CHECK_MIN_HW
 12.51722|  ReasonCode 0x0c04 RC_SYSAVAIL_INSUFFICIENT_HW
 12.54457|  UserData1   : 0x0000000000000000
 12.54458|  UserData2   : 0x0000000000000000
 12.54458|------------------------------------------------
 12.54459|  Callout type             : Procedure Callout
 12.54460|  Procedure                : EPUB_PRC_FIND_DECONFIGURED_PART
 12.54461|  Priority                 : SRCI_PRIORITY_HIGH
 12.54462|------------------------------------------------
 12.54462|  Hostboot Build ID: hostboot-3beba24/hbicore.bin
 12.54463|================================================
 12.73660|System shutting down with error status 0x90000007
 12.75545|================================================
 12.75546|Error reported by istep (0x1700) PLID 0x90000007
 12.77991|  IStep failed, see other log(s) with the same PLID for reason.
 12.77992|  ModuleId   0x01 MOD_REPORTING_ERROR
 12.77993|  ReasonCode 0x1703 RC_FAILURE
 12.77994|  UserData1  eid of first error : 0x9000000800000c04
 12.77995|  UserData2  Reason code of first error : 0x0000000100000609
 12.77996|------------------------------------------------
 12.77996|  host_gard
 12.77997|------------------------------------------------
 12.77998|  Callout type             : Procedure Callout
 12.77998|  Procedure                : EPUB_PRC_HB_CODE
 12.77999|  Priority                 : SRCI_PRIORITY_LOW
 12.78000|------------------------------------------------
 12.78001|  Hostboot Build ID: hostboot-3beba24/hbicore.bin
 12.78002|================================================

Looking forward to getting some DIMMs to show/share more.

December 09, 2019

systemd-nspawn and Private Networking

Currently there’s two things I want to do with my PC at the same time, one is watching streaming services like ABC iView (which won’t run from non-Australian IP addresses) and another is torrenting over a VPN. I had considered doing something ugly with iptables to try and get routing done on a per-UID basis but that seemed to difficult. At the time I wasn’t aware of the ip rule add uidrange [1] option. So setting up a private networking namespace with a systemd-nspawn container seemed like a good idea.

Chroot Setup

For the chroot (which I use as a slang term for a copy of a Linux installation in a subdirectory) I used a btrfs subvol that’s a snapshot of the root subvol. The idea is that when I upgrade the root system I can just recreate the chroot with a new snapshot.

To get this working I created files in the root subvol which are used for the container.

I created a script like the following named /usr/local/sbin/container-sshd to launch the container. It sets up the networking and executes sshd. The systemd-nspawn program is designed to launch init but that’s not required, I prefer to just launch sshd so there’s only one running process in a container that’s not being actively used.

#!/bin/bash

# restorecon commands only needed for SE Linux
/sbin/restorecon -R /dev
/bin/mount none -t tmpfs /run
/bin/mkdir -p /run/sshd
/sbin/restorecon -R /run /tmp
/sbin/ifconfig host0 10.3.0.2 netmask 255.255.0.0
/sbin/route add default gw 10.2.0.1
exec /usr/sbin/sshd -D -f /etc/ssh/sshd_torrent_config

How to Launch It

To setup the container I used a command like “/usr/bin/systemd-nspawn -D /subvols/torrent -M torrent –bind=/home -n /usr/local/sbin/container-sshd“.

First I had tried the --network-ipvlan option which creates a new IP address on the same MAC address. That gave me an interface iv-br0 on the container that I could use normally (br0 being the bridge used in my workstation as it’s primary network interface). The IP address I assigned to that was in the same subnet as br0, but for some reason that’s unknown to me (maybe an interaction between bridging and network namespaces) I couldn’t access it from the host, I could only access it from other hosts on the network. I then tried the --network-macvlan option (to create a new MAC address for virtual networking), but that had the same problem with accessing the IP address from the local host outside the container as well as problems with MAC redirection to the primary MAC of the host (again maybe an interaction with bridging).

Then I tried just the “-n” option which gave it a private network interface. That created an interface named ve-torrent on the host side and one named host0 in the container. Using ifconfig and route to configure the interface in the container before launching sshd is easy. I haven’t yet determined a good way of configuring the host side of the private network interface automatically.

I had to use a bind for /home because /home is a subvol and therefore doesn’t get included in the container by default.

How it Works

Now when it’s running I can just “ssh -X” to the container and then run graphical programs that use the VPN while at the same time running graphical programs on the main host that don’t use the VPN.

Things To Do

Find out why --network-ipvlan and --network-macvlan don’t work with communication from the same host.

Find out why --network-macvlan gives errors about MAC redirection when pinging.

Determine a good way of setting up the host side after the systemd-nspawn program has run.

Find out if there are better ways of solving this problem, this way works but might not be ideal. Comments welcome.

December 06, 2019

Audiobooks – September 2019

Off the Rails: A Train Trip Through Life by Beppe Severgnini

A collection of train journey articles (written over about 20 years). A good selection on interesting and amusing. 7/10

Exoplanets: Hidden Worlds and the Quest for Extraterrestrial Life by Donald Goldsmith

A history of the discovery of exoplanets, covering the different groups, techniques and rivalries. Good although I got the people mixed up sometimes. 7/10

Save the Cat! : The Last Book on Screenwriting You’ll Ever Need by Blake Snyder

A guide to screenwriting with a few stories and observations on movies thrown in. Good even if you are just reading it for fun. 7/10

Being Mortal: Medicine and What Matters in the End by Atul Gawande

A book about geriatric and end-of-life care and choices. Lots of points about how risking all for aggressive treatment is often a very bad idea. Thought-provoking. 9/10

Ancient Alexandria: The History and Legacy of Egypt’s Most Famous City by Charles River Editors

Just a two hour long overview of the history. Covered the basic stuff and maybe worth skimming before you hit something meatier. 6/10

Vulcan 607 by Rowland White

The story of the long-distant bombing raids during the Falkland’s war. Lots of details on the history of the Vulcan, the crews, background and the actual missions. 9/10

101 Secrets For Your Twenties by Paul Angone

I really can’t remember this book well. I think it was okay but serves me right for getting months behind on reviews. On list for completeness. ?/10

Share

December 02, 2019

LUV December 2019 Main Meeting: A review of Linux and Open Source in 2019

Dec 3 2019 19:00
Dec 3 2019 21:00
Dec 3 2019 19:00
Dec 3 2019 21:00
Location: 
Kathleen Syme Library, 251 Faraday Street Carlton VIC 3053

NOTE: The library closes at 7pm so arrivals after that time will need to contact Andrew on (0421) 775 358 or any other attendee for admission.

Speaker:  Alexar Pendashteh

A review of Linux and Open Source in 2019

This is the last main meeting of LUV in 2019!
In this meeting we are going to have a look at what 2019 had for Linux and Open Source and have a peek into what's coming up.
This event will be mainly a social event, with group discussion followed by a dinner in a nearby resturant or cafe!

Many of us like to go for dinner nearby in Lygon St. after the meeting. Please let us know if you'd like to join us!

Linux Users of Victoria is a subcommittee of Linux Australia.

December 3, 2019 - 19:00

November 25, 2019

Fixing Turris Omnia WiFi Quality

I was recently hoping to replace an aging proprietary router (upgraded to a Gargoyle FOSS firmware). After rejecting a popular brand with a disturbing GPL violation habit, I settled on the Turris Omnia router, built on free software. Overall, I was pretty satisfied with the fact that it is free and comes with automatic updates, but I noticed a problem with the WiFi. Specifically, the 5 GHz access point was okay but the 2.4 GHz was awful.

False lead

I initially thought that the 2.4 GHz radio wasn't working, but then I realized that putting my phone next to the router would allow it to connect and exchange data at a slow-but-steady rate. If I moved the phone more than 3-4 meters away though, it would disconnect for lack of signal. To be frank, the wireless performance was much worse than my original router, even though the wired performance was, as expected, amazing:

I looked on the official support forums and found this intriguing thread about interference between USB3 and 2.4 GHz radios. This sounded a lot like what I was experiencing (working radio but terrible signal/interference) and so I decided to see if I could move the radios around inside the unit, as suggested by the poster.

After opening the case however, I noticed that radios were already laid out in the optimal way:

and that USB3 interference wasn't going to be the reason for my troubles.

Real problem

So I took a good look at the wiring and found that while the the larger radio (2.4 / 5 GHz dual-bander) was connected to all three antennas, the smaller radio (2.4 GHz only) was connected to only 2 of the 3 antennas:

To make it possible for antennas 1 and 3 to carry the signal from both radios, a duplexer got inserted between the radios and the antenna:

On one side is the 2.4 antenna port and on the other side is the 5 GHz port.

Looking at the wiring though, it became clear that my 2.4 GHz radio was connected to the 5 GHz ports of the two duplexers and the 5 GHz radio was connected to the 2.4 GHz ports of the duplexers. This makes sense considering that I had okay 5 GHz performance (with one of the three chains connected to the right filter) and abysimal 2.4 GHz performance (with none of the two chains connected to the right filter).

Solution

Swapping the antenna connectors around completely fixed the problem. With the 2.4 GHz radio connected to the 2.4 side of the duplexer and the dual-bander connected to the 5 GHz side, I was able to get the performance I would expect from such a high-quality router.

Interestingly enough, I found the solution to this problem the same weekend as I passed my advanced amateur radio license exam. I guess that was a good way to put the course material into practice!

November 18, 2019

4K Monitors

A couple of years ago a relative who uses a Linux workstation I support bought a 4K (4096*2160 resolution) monitor. That meant that I had to get 4K working, which was 2 years of pain for me and probably not enough benefit for them to justify it. Recently I had the opportunity to buy some 4K monitors at a low enough price that it didn’t make sense to refuse so I got to experience it myself.

The Need for 4K

I’m getting older and my vision is decreasing as expected. I recently got new glasses and got a pair of reading glasses as a reduced ability to change focus is common as you get older. Unfortunately I made a mistake when requesting the focus distance for the reading glasses and they work well for phones, tablets, and books but not for laptops and desktop computers. Now I have the option of either spending a moderate amount of money to buy a new pair of reading glasses or just dealing with the fact that laptop/desktop use isn’t going to be as good until the next time I need new glasses (sometime 2021).

I like having lots of terminal windows on my desktop. For common tasks I might need a few terminals open at a time and if I get interrupted in a task I like to leave the terminal windows for it open so I can easily go back to it. Having more 80*25 terminal windows on screen increases my productivity. My previous monitor was 2560*1440 which for years had allowed me to have a 4*4 array of non-overlapping terminal windows as well as another 8 or 9 overlapping ones if I needed more. 16 terminals allows me to ssh to lots of systems and edit lots of files in vi. Earlier this year I had found it difficult to read the font size that previously worked well for me so I had to use a larger font that meant that only 3*3 terminals would fit on my screen. Going from 16 non-overlapping windows and an optional 8 overlapping to 9 non-overlapping and an optional 6 overlapping is a significant difference. I could get a second monitor, and I won’t rule out doing so at some future time. But it’s not ideal.

When I got a 4K monitor working properly I found that I could go back to a smaller font that allowed 16 non overlapping windows. So I got a real benefit from a 4K monitor!

Video Hardware

Version 1.0 of HDMI released in 2002 only supports 1920*1080 (FullHD) resolution. Version 1.3 released in 2006 supported 2560*1440. Most of my collection of PCIe video cards have a maximum resolution of 1920*1080 in HDMI, so it seems that they only support HDMI 1.2 or earlier. When investigating this I wondered what version of PCIe they were using, the command “dmidecode |grep PCI” gives that information, seems that at least one PCIe video card supports PCIe 2 (released in 2007) but not HDMI 1.3 (released in 2006).

Many video cards in my collection support 2560*1440 with DVI but only 1920*1080 with HDMI. As 4K monitors don’t support DVI input that meant that when initially using a 4K monitor I was running in 1920*1080 instead of 2560*1440 with my old monitor.

I found that one of my old video cards supported 4K resolution, it has a NVidia GT630 chipset (here’s the page with specifications for that chipset [1]). It seems that because I have a video card with 2G of RAM I have the “Keplar” variant which supports 4K resolution. I got the video card in question because it uses PCIe*8 and I had a workstation that only had PCIe*8 slots and I didn’t feel like cutting a card down to size (which is apparently possible but not recommended), it is also fanless (quiet) which is handy if you don’t need a lot of GPU power.

A couple of months ago I checked the cheap video cards at my favourite computer store (MSY) and all the cheap ones didn’t support 4K resolution. Now it seems that all the video cards they sell could support 4K, by “could” I mean that a Google search of the chipset says that it’s possible but of course some surrounding chips could fail to support it.

The GT630 card is great for text, but the combination of it with a i5-2500 CPU (rating 6353 according to cpubenchmark.net [3]) doesn’t allow playing Netflix full-screen and on 1920*1080 videos scaled to full-screen sometimes gets mplayer messages about the CPU being too slow. I don’t know how much of this is due to the CPU and how much is due to the graphics hardware.

When trying the same system with an ATI Radeon R7 260X/360 graphics card (16* PCIe and draws enough power to need a separate connection to the PSU) the Netflix playback appears better but mplayer seems no better.

I guess I need a new PC to play 1920*1080 video scaled to full-screen on a 4K monitor. No idea what hardware will be needed to play actual 4K video. Comments offering suggestions in this regard will be appreciated.

Software Configuration

For GNOME apps (which you will probably run even if like me you use KDE for your desktop) you need to run commands like the following to scale menus etc:

gsettings set org.gnome.settings-daemon.plugins.xsettings overrides "[{'Gdk/WindowScalingFactor', <2>}]"
gsettings set org.gnome.desktop.interface scaling-factor 2

For KDE run the System Settings app, go to Display and Monitor, then go to Displays and Scale Display to scale things.

The Arch Linux Wiki page on HiDPI [2] is good for information on how to make apps work with high DPI (or regular screens for people with poor vision).

Conclusion

4K displays are still rather painful, both in hardware and software configuration. For serious computer use it’s worth the hassle, but it doesn’t seem to be good for general use yet. 2560*1440 is pretty good and works with much more hardware and requires hardly any software configuration.

November 17, 2019

Use swap on NVMe to run more dev KVM guests, for when you run out of RAM

I often spin up a bunch of VMs for different reasons when doing dev work and unfortunately, as awesome as my little mini-itx Ryzen 9 dev box is, it only has 32GB RAM. Kernel Samepage Merging (KSM) definitely helps, however when I have half a dozens or so VMs running and chewing up RAM, the Kernel’s Out Of Memory (OOM) killer will start executing them, like this.

[171242.719512] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/machine.slice/machine-qemu\x2d435\x2dtest\x2dvm\x2dcentos\x2d7\x2d00.scope,task=qemu-system-x86,pid=2785515,uid=107
[171242.719536] Out of memory: Killed process 2785515 (qemu-system-x86) total-vm:22450012kB, anon-rss:5177368kB, file-rss:0kB, shmem-rss:0kB
[171242.887700] oom_reaper: reaped process 2785515 (qemu-system-x86), now anon-rss:0kB, file-rss:68kB, shmem-rss:0kB

If I had more slots available (which I don’t) I could add more RAM, but that’s actually pretty expensive, plus I really like the little form factor. So, given it’s just dev work, a relatively cheap alternative is to buy an NVMe drive and add a swap file to it (or dedicate the whole drive). This is what I’ve done on my little dev box (actually I bought it with an NVMe drive so adding the swapfile came for free).

Of course the number of VMs you can run depends on the amount of RAM each VM actually needs for what you’re running on it. But whether I’m running 100 small VMs or 10 large ones, it doesn’t matter.

To demonstrate this, I spin up a bunch of CentOS 7 VMs at the same time and upgrade all packages. Without swap I could comfortably run half a dozen VMs, but more than that and they would start getting killed. With 100GB swap file I am able to get about 40 going!

Even with pages swapping in and out, I haven’t really noticed any performance decrease and there is negligible CPU time wasted waiting on disk I/O when using the machines normally.

The main advantage for me is that I can keep lots of VMs around (or spin up dozens) in order to test things, without having to juggle active VMs or hoping they won’t actually use their memory and have the kernel start killing my VMs. It’s not as seamless as extra RAM would be, but that’s expensive and I don’t have the slots for it anyway, so this seems like a good compromise.

November 16, 2019

DrupalSouth Diversity Scholarship Winner Announced

A few weeks ago we announced our diversity scholarship for DrupalSouth. Before announcing the winner I want to talk a bit about our experience doing this for the first time.

DrupalSouth is the largest Drupal event held in Oceania every year. It provides a great marketing opportunity for businesses wanting to promote their products and services to the Drupal community. Dave Hall Consulting planned to sponsor DrupalSouth to promote our new training business - Getting It Live training. By the time we got organised all of the (affordable) sponsorship opportunities had gone. After considering various opportunities around the event we felt the best way of investing a similar amount of money and giving something back to the community was through a diversity scholarship

The community provided positive feedback about the initiative. However despite the enthusiasm and working our networks to get a range of applicants, we only ended up with 7 applicants. They were all guys. One applicant was from Australia, the rest were from overseas. About half the applicants dropped out when contacted to confirm that they could cover their own travel and visa expenses.

We are likely to offer other scholarships in the future. We will start earlier and explore other channels for promoting the program.

The scholarship has been awarded to Yogesh Ingale, from Mumbai, India. Over the last 3 years Yogesh has been employed by Tata Consultancy Services’ digital operations team as a DevOps Engineer. During this time he has worked with Drupal, Cloud Computing, Python and Web Technologies. Yogesh is interested in automating processes. When he’s not working, Yogesh likes to travel, automate things and write blog posts. Disclaimer: I know Yogesh through my work with one of my clients. Some times the Drupal community feels pretty small.

Congratulations Yogesh! I am looking forward to seeing you in Hobart.

If you want to meet Yogesh before DrupalSouth, we still have some seats available for our 73780151419">2 day git training course that’s running on 25-26 November. If you won’t be in Hobart, contact us to discuss your training needs.

November 10, 2019

Database Tab Sweep

I miss a proper database related newsletter for busy people. There’s so much happening in the space, from tech, to licensing, and even usage. Anyway, quick tab sweep.

Paul Vallée (of Pythian fame) has been working on Tehama for sometime, and now he gets to do it full time as a PE firm, bought control of Pythian’s services business. Pythian has more than 350 employees, and 250 customers, and raised capital before. More at Ottawa’s Pythian spins out software platform Tehama.

Database leaks data on most of Ecuador’s citizens, including 6.7 million children – ElasticSearch.

Percona has launched Percona Distribution for PostgreSQL 11. This means they have servers for MySQL, MongoDB, and now PostgreSQL. Looks very much like a packaged server with tools from 3rd parties (source).

Severalnines has launched Backup Ninja, an agent-based SaaS service to backup popular databases in the cloud. Backup.Ninja (cool URL) supports MySQL (and variants), MongoDB, PostgreSQL and TimeScale. No pricing available, but it is free for 30 days.

Comparing Database Types: How Database Types Evolved to Meet Different Needs

New In PostgreSQL 12: Generated Columns – anyone doing a comparison with MariaDB Server or MySQL?

Migration Complete – Amazon’s Consumer Business Just Turned off its Final Oracle Database – a huge deal as they migrated 75 petabytes of internal data to DynamoDB, Aurora, RDS and Redshift. Amazon, powered by AWS, and a big win for open source (a lot of these services are built-on open source).

MongoDB and Alibaba Cloud Launch New Partnership – I see this as a win for the SSPL relicense. It is far too costly to maintain a drop-in compatible fork, in a single company (Hi Amazon DocumentDB!). Maybe if the PostgreSQL layer gets open sourced, there is a chance, but otherwise, all good news for Alibaba and MongoDB.

MySQL 8.0.18 brings hash join, EXPLAIN ANALYZE, and more interestingly, HashiCorp Vault support for MySQL Keyring. (Percona has an open source variant).

Some thoughts on Storytelling as an engineering teaching tool

Every week at work on Wednesday afternoons we have the SRE ops review, a relaxed two hour affair where SREs (& friends of, not all of whom are engineers) share interesting tidbits that have happened over the last week or so, this might be a great success, an outage, a weird case, or even a thorny unsolved problem. Usually these relate to a service the speaker is oncall for, or perhaps a dependency or customer service, but we also discuss major incidents both internal & external. Sometimes a recent issue will remind one of the old-guard (of which I am very much now a part) of a grand old story and we share those too.

Often the discussion continues well into the evening as we decant to one of the local pubs for dinner & beer, sometimes chatting away until closing time (probably quite regularly actually, but I'm normally long gone).

It was at one of these nights at the pub two months ago (sorry!), that we ended up chatting about storytelling as a teaching tool, and a colleague asked an excellent question, that at the time I didn't have a ready answer for, but I've been slowly pondering, and decided to focus on over an upcoming trip.

As I start to write the first draft of this post I've just settled in for cruise on my first international trip in over six months[1], popping over to Singapore for the Melbourne Cup weekend, and whilst I'd intended this to be a holiday, I'm so terrible at actually having a holiday[2] that I've ended up booking two sessions of storytelling time, where I present the history of Google's production networks (for those of you reading this who are current of former engineering Googlers, similar to Traffic 101). It's with this perspective of planning, and having run those sessions that I'm going to try and answer the question that I was asked.

Or at least, I'm going to split up the question I was asked and answer each part.

"What makes storytelling good"

On its own this is hard to answer, there are aspects that can help, such as good presentation skills (ideally keeping to spoken word, but simple graphs, diagrams & possibly photos can help), but a good story can be told in a dry technical monotone and still be a good story. That said, as with the rest of these items charisma helps.

"What makes storytelling interesting"

In short, a hook or connection to the audience, for a lot of my infrastructure related outage stories I have enough context with the audience to be able to tie the impact back in a way that resonates with a person. For larger disparate groups shared languages & context help ensure that I'm not just explaining to one person.

In these recent sessions one was with a group of people who work in our Singapore data centre, in that session I focused primarily on the history & evolution of our data centre fabrics, giving them context to understand why some of the (at face level) stranger design decisions have been made that way.

The second session was primarily people involved in the deployment side of our backbone networks, and so I focused more on the backbones, again linking with knowledge the group already had.

"What makes storytelling entertaining"

Entertaining storytelling is a matter of style, skills and charisma, and while many people can prepare (possibly with help) an entertaining talk, the ability to tell an entertaining story off the cuff is more of a skill, luckily for me, one I seem to do ok with. Two things that can work well are dropping in surprises, and where relevant some level of self-deprecation, however both need to be done very carefully.

Surprises can work very well when telling a story chronologically "I assumed X because Y, <five minutes of waffling>, so it turned out I hadn't proved Y like I thought, so it wasn't X, it was Z", they can help the audience to understand why a problem wasn't solved so easily, and explaining "traps for young players" as Dave Jones (of the EEVblog) likes to say can themselves be really helpful learning elements. Dropping surprises that weren't surprises to the story's protagonist generally only works if it's as a punchline of a joke, and even then it often doesn't.

Self-deprecation is an element that I've often used in the past, however more recently I've called others out on using it, and have been trying to reduce it myself, depending on the audience you might appear as a bumbling success or stupid, when the reality may be that nobody understood the situation properly, even if someone should have. In the ops review style of storytelling, it can also lead to a less experienced audience feeling much less confident in general than they should, which itself can harm productivity and careers.

If the audience already had relevant experience (presenting a classic SRE issue to other SREs for example, a network issue to network engineers, etc.) then audience interaction can work very well for engagement. "So the latency graph for database queries was going up and to the right, what would you look at?" This is also similar to one of the ways to run a "wheel of misfortune" outage simulation.

"What makes storytelling useful & informative at the same time"

In the same way as interest, to make storytelling useful & informative for the audience involves consideration for the audience, as a presenter if you know the audience, at least in broad strokes this helps. As I mentioned above, when I presented my talk to a group of datacenter-focused people I focused on the DC elements, connecting history to the current incarnations; when I presented to a group of more general networking folk a few days later, I focused more on the backbones and other elements they'd encountered.

Don't assume that a story will stick wholesale, just leaving a few keywords, or even just a vague memory with a few key words they can go digging for can make all the difference in the world. Repetition works too, sharing many interesting stories that share the same moral (for an example, one of the ops review classics is demonstrations about how lack of exponential backoff can make recovery from outages hard), hearing this over dozens of different stories over weeks (or months, or years...) it eventually seeps in as something to not even question having been demonstrated as such an obvious foundation of good systems.

When I'm speaking to an internal audience I'm happy if they simply remember that I (or my team) exist and might be worth reaching out to in future if they have questions.

Lastly, storytelling is a skill you need to practice, whether a keynote presentation in front of a few thousand people, or just telling tall takes to some mates at the pub practice helps, and eventually many of the elements I've mentioned above become almost automatic. As can probably be seen from this post I could do with some more practice on the written side.

1: As I write these words I'm aboard a Qantas A380 (QF1) flying towards Singapore, the book I'm currently reading, of all things about mechanical precision ("Exactly: How Precision Engineers Created the Modern World" or as it has been retitled for paperback "The Perfectionists"), has a chapter themed around QF32, the Qantas A380 that notoriously had to return to Singapore after an uncontained engine failure. Both the ATSB report on the incident and the captain Richard de Crespigny's book QF32 are worth reading. I remember I burned though QF32 one (very early) morning when I was stuck in GlobalSwitch Sydney waiting for approval to repatch a fibre, one of the few times I've actually dealt with the physical side of Google's production networks, and to date the only time the fact I live just a block from that facility has been used at all sensibly.

2: To date, I don't think I've ever actually had a holiday that wasn't organised by family, or attached to some conference, event or work travel I'm attending. This trip is probably the closest I've ever managed (roughly equal to my burnout trip to Hawaii in 2014), and even then I've ruined it by turning two of the three weekdays into work. I'm much better at taking breaks that simply involve not leaving home or popping back to stay with family in Melbourne.

November 04, 2019

Audiobooks – August 2019

Periodic Tales: The Curious Lives of the Elements by Hugh Aldersey-Williams

Various depths of coverage (usually by interest of the story) of the discovery, usage and literature/cultural impact around each of the elements. 8/10

Born to Run by Bruce Springsteen

Autobiography read by the author. Covers his whole career and personal life. Well written and lots of details and insight. Well read too. 9/10

The Admirals: Nimitz, Halsey, Leahy, and King – The Five-Star Admirals Who Won the War at Sea by Walter R. Borneman

A Biography of the 5 Admirals and the interactions of their careers before and during World War 2. 7/10

Because Internet: Understanding the New Rules of Language by Gretchen McCulloch

I really can’t remember this book (serves me right for delaying reviews). I think it was okay though. [67]/10

The 4% Universe: Dark Matter, Dark Energy, and the Race to Discover the Rest of Reality by Richard Panek

Pretty much what the subtitles says. Worked fairly well at keep the different people distinct and technical explanations made sense. 7/10

The Unopened casebook of Sherlock Holmes written by John Taylor with Simon Callow as Sherlock Holmes and Nicky Henson as Dr Watson

6 audioplay stories. Quality is okay although I detected a theme with the villains. 7/10

Best. Movie. Year. Ever: How 1999 Blew Up the Big Screen by Brian Raftery

A run though of the great (and a few not) movies that came out in 1999. Some backstories on many with industry and world news from the year. 8/10


Share

November 03, 2019

KMail Crashing and LIBGL

One problem I’ve had recently on two systems with NVideo video cards is KMail crashing (SEGV) while reading mail. Sometimes it goes for months without having problems, and then it gets into a state where reading a few messages (or sometimes reading one particular message) causes a crash. The crash happens somewhere in the Mesa library stack.

In an attempt to investigate this I tried running KMail via ssh (as that precludes a lot of the GL stuff), but that crashed in a different way (I filed an upstream bug report [1]).

I have discovered a workaround for this issue, I set the environment variable LIBGL_ALWAYS_SOFTWARE=1 and then things work. At this stage I can’t be sure exactly where the problems are. As it’s certain KMail operations that trigger it I think that’s evidence of problems originating in KMail, but the end result when it happens often includes a kernel error log so there’s probably a problem in the Nouveau driver. I spent quite a lot of time investigating this, including recompiling most of the library stack with debugging mode and didn’t get much of a positive result. Hopefully putting it out there will help the next person who has such issues.

Here is a list of environment variables that can be set to debug LIBGL issues (strangely I couldn’t find documentation on this when Googling it). If you are stuck with a problem related to LIBGL you can try setting each of these to “1” in turn and see if it makes a difference. That can either be for the purpose of debugging a problem or creating a workaround that allows you to run the programs you need to run. I don’t know why GL is required to read email.

LIBGL_DIAGNOSTIC
LIBGL_ALWAYS_INDIRECT
LIBGL_ALWAYS_SOFTWARE
LIBGL_DRI3_DISABLE
LIBGL_NO_DRAWARRAYS
LIBGL_DEBUG
LIBGL_DRIVERS_PATH
LIBGL_DRIVERS_DIR
LIBGL_SHOW_FPS

November 01, 2019

LUV November 2019 Main Meeting: nfq - an ad blocker that runs on the router

Nov 6 2019 19:00
Nov 6 2019 21:00
Nov 6 2019 19:00
Nov 6 2019 21:00
Location: 
Kathleen Syme Library, 251 Faraday Street Carlton VIC 3053

NOTE: This month's meeting will be on WEDNESDAY night due to the Melbourne Cup public holiday.  The library closes at 7pm so arrivals after that time will need to contact Andrew on (0421) 775 358 or any other attendee for admission.

Speaker:  Duncan Roe, nfq - an ad blocker that runs on the router

Many of us like to go for dinner nearby after the meeting, typically at Brunetti's or Trotters Bistro in Lygon St.  Please let us know if you'd like to join us!

Linux Users of Victoria is a subcommittee of Linux Australia.

November 6, 2019 - 19:00

read more

LUV November 2019 Workshop: Replacing Windows 7 with Linux

Nov 16 2019 12:30
Nov 16 2019 16:30
Nov 16 2019 12:30
Nov 16 2019 16:30
Location: 
Infoxchange, 33 Elizabeth St. Richmond

Replacing Windows 7 with Linux

What to do with your Windows 7 PC when its EOL arrives in January next year?  Install Linux of course!  Wen Lin will lead this talk with an intro, then get everyone to join in for a Q&A - let's share all the great ideas (and personal experience) on how to install a variety of Linux Distros to replace one's obsolete Win7 - and breathe new life into one's PC.

The meeting will be held at Infoxchange, 33 Elizabeth St. Richmond 3121.  Late arrivals please call (0421) 775 358 for access to the venue.

LUV would like to acknowledge Infoxchange for the venue.

Linux Users of Victoria is a subcommittee of Linux Australia.

November 16, 2019 - 12:30

read more

October 29, 2019

Buying an Apple Watch for 7USD

For DrupalCon Amsterdam, Srijan ran a competition with the prize being an Apple Watch 5. It was a fun idea. Try to get a screenshot of an animated GIF slot machine showing 3 matching logos and tweet it.

Try your luck at @DrupalConEur Catch 3 in a row and win an #AppleWatchSeries5. To participate, get 3 of the same logos in a series, grab a screenshot and share it with us in the comment section below. See you in Amsterdam! #SrijanJackpot #ContestAlert #DrupalCon

I entered the competition.

I managed to score 3 of the no logo logos. That's gotta be worth something, right? #srijanJackpot

The competition had a flaw. The winner was selected based on likes.

After a week I realised that I wasn’t going to win. Others were able to garner more likes than I could. Then my hacker mindset kicked in.

I thought I’d find how much 100 likes would cost. A quick search revealed likes costs pennies a piece. At this point I decided that instead of buying an easy win, I’d buy a ridiculous number of likes. 500 likes only cost 7USD. Having a blog post about gaming the system was a good enough prize for me.

Receipt: 500 likes for 7USD

I was unsure how things would go. I was supposed to get my 500 likes across 10 days. For the first 12 hours I got nothing. I thought I’d lost my money on a scam. Then the trickle of likes started. Every hour I’d get a 2-3 likes, mostly from Eastern Europe. Every so often I’d get a retweet or a bonus like on a follow up comment. All up I got over 600 fake likes. Great value for money.

Today Sirjan awarded me the watch. I waited until after they’ve finished taking photos before coming clean. Pics or it didn’t happen and all that. They insisted that I still won the competition without the bought likes.

The prize being handed over

Think very carefully before launching a competition that involves social media engagement. There’s a whole fake engagement economy.

October 27, 2019

FreeDV between Argentina and the UK

Jose (LU5DKI) has been in daily contact with a group of UK Hams including Eric (GW8LJJ) Cess (GW3OAJ) Steve (G7HZI). They are using FreeDV 700D over a novel combination of HF radio channels and the Internet via SDRs.

Jose transmits from his station in Argentina to a KiwiSDR in Santiago, Chile, around 1500km away. The UK hams listen to this SDR over the Internet. To receive, Jose listens to a KiwiSDR in the UK. The combination of the Internet and HF radio gives them reliable communications at a time where long distance band conditions are poor.

Thanks Jose for the video. You can see the “barber pole” HF fading on the signal from the UK.

Several of the UK Hams are using SM1000s running the new v2 firmware that includes FreeDV 700D. Good to see that working well in the field.

FreeDV 1.4 includes 700C/700D improvements, and the new FreeDV 2020 mode. I hope to release FreeDV 1.4 later this year. However it’s already working quite well (just a few small issues to go), so if you would like to try a Windows development version of FreeDV 1.4, please contact me. For Linux users, it’s quite easy to compile from source.

October 22, 2019

DevOpsDays NZ 2019 – Day 2 – Session 3

Everett Toews – Is GitOps worthy of the [BuzzWord]Ops moniker?

  • Usual Git workflow
  • But it takes some action
  • Applying desired state from Git
  • Example: Infrastructure as code
    • DNS
    • Onboarding and offboarding
  • Git is now a SPOF
  • Change Management Dept is now a barrier
  • Integrate with ITSM
  • Benefits: Self-service, Compiience

Joel Wirāmu Pauling – Why Bare Metal still maters

  • Cloud Native Dev doesn’t exist as a closer system
  • IoT is all hardware
  • AI/ML is using special hardware
  • Networks is all hardware offloads
  • FPGAs and ASICS need more standard open way to access
  • You’ll always have weird stuffs on your network
  • Virtualization has abstracted away the real
  • We care able vendor lockin with cloud APIs and Aus electricity isn’t all that green

Steven Ensslen – Do you have a data quality problem?

  • What is data ops and why do we want it?
  • People think they have a data quality problem but they don’t actually measure it to see how bad.
  • Causes all sorts of problems.
  • 3 Easy steps to fix data quaility
  • 1 – Document data charactersistics and train people to know them
  • 2 – Monitor data as if it is infrastructure
    • Test data like it is code
  • 3 – Professionalize your support of data professionals
    • Bring in the spreadsheet experts
    • Support reporting and analytics people too

Mandi Buswell – What are Kubernetes Operators and Why do I care

  • Like an App Store on your kubernetes cluster
  • Like a like Kubernetes robot doing that hard work for you. Lifecycle management
  • Operators run as microservices on the kubernetes cluster
  • operatorhub.io
  • Work on any kubernetes cluster
  • You can even write your own

Laura Bell – Securing the systems of the future

  • Fear and Lothing
    • It is an old problem because “People are Jerks”
  • All organization try either Fight, Flight, Freeze
  • Trying to protect: Confidentiality, Integrity, Availbality
  • Protect, Detect, Respond
  • Monolith
    • A big wall around
    • Layered defense is better but not the final solution
    • Defensive software architecture is not just prevention
    • Castles had lots of layers of defenses. Some prevention, Some Detection, Some response
  • MIcroservices
    • Look at something in the middle of a star and erase it
    • Push malicious code into deployment pipelines
  • Avoid scar tissue, stuff put in just to avoid specific previous problems. Make you feel safe but without any real evidence.
  • Fearless security patterns and approaches
  • Technology is changing but the basics are still the same
  • Lots of techniques in computer security.
  • Prevention and Detection are interchangeable
  • Batman vs Meercat model
  • Be Aware and challenge your own bubble
  • Supply Chains are vulnerable: Integrations, dependencies, Data Sources
  • Determinate threat vs Dynamic Threat
    • Can’t predicts which steps in which order are going to get the result
    • Comprimise the data then the engine will return bad results
  • Plug for opensecurity.nz

Share

DevOpsDays NZ 2019 – Day 2 – Session 2

Jacob Ivester – Diagnose DevOps: The work behind the work

  • Unhappy DevOps Family
    • Unsupport Software
    • Releases outside of primetime
    • etc
  • Focus on Process as a common problem
    • Manage Change that Affects Multiple teams
    • Throughputs vs Outputs
  • Repeatability
  • Extensibility
  • Visability
  • Safety

Cameron Huysmans – Designing an Enterprise Secrets Management Service using HashiCorp Vault

  • Australian based Bank
  • Transition for last 30 years for a bank to a layered based security model (all the way down to the server in the datacentre)
  • In 2017 moved to the cloud and infrastructure in the cloud
  • What makes a bank – licensed to operate
    • Must demonstrate control of the process
    • Reports problems to regulator
    • Identifyable business Processes
    • All Humans
  • If you use a pipeline there are no humans in the process. These machine process needs to conform to the same control
    • Archetecture naturally resistent to change. Change requires a complex process
    • ITIL
    • 2FA required for everything
    • Secrets everywhere
  • Disruption
    • Dynamic Systems with constant updates
    • Immutable containers
    • Changes done via code
    • Live system changes
    • Code and automation drives things
    • Dynamic CMDB – High Levels of abstraction
    • But you still have a secrets problems
  • Secrets Management
    • Not just a place to store passwords
    • But also a Chain of Trust
  • If Pipelines make the change who owns it, who audits it?
  • Vault becomes a bit of audit by saying who used something (person or process)
  • Why another tool ?
  • Created a pattered on how thing will be deployed. Got Security to okay it. Build it in a pipeline
  • Vault placed in the highest security area
    • But less-secure areas needed to talk to it.
    • Lots of zones internally. Some in Cloud, DMZ
    • Some talk via API gateway to main vault
    • Had a Vault replica that had a copy of some secrets and could be used by those zones that were not allowed to to the secrets zone
  • Learnings
    • This is hard, especially in the cloud
    • If Pipelines are doing the change, that must be kept secure. Attribution, notification and real-time analytics
    • Declarative manifests of change (code, scripts, tools) require more strict access controls
    • Avoid direct point-to-point connections

Share

October 21, 2019

DevOpsDays NZ 2019 – Day 2 – Session 1

Cath Jones – The Myth of the Senior Engineer

  • They won’t be able to hit the ground running on Day 1
    • Assume they know everything about how things work at your organisation that is organisation or industry-specific
    • If you don’t account for this you will see problems, stress, high turnover
  • Example: Trail by Fire
    • You get shown the basic stuff and then given your first ticket
  • How do you take organisation knowledge and empower people?
  • Employee Socialisation
    • Helps mitigate problems and assumptions
    • Facilitates communication and networking
    • Allows people to begin contributing sooner
  • Pre-Arrival Stage
    • Let people know what is expected
    • Let existing people kno who is thating and our expectations for them
    • Example: Automatic (wordpress)
      • Asked people in the final stages to complete some (paid) work.
      • Candiatites get better understanding of the company
  • Preparing for Transition
    • Culture-shock
    • How are you like compared to where they came from?
    • The new role compared to their previous one?
    • Come from a place where they were an expert and had lots of domain-specific knowledge to being a newbie
  • The Encounter Stage
    • Mentoring, Communication, Technical onboarding
    • Example: Cohorts of new hires
    • Mentoring: Proven way to socialise Senior engineers. Can be Labour intensive but helps when documentation lacking
    • Share Mentor-ship responsibilities: eg Technical and Organisational mentor seperate
    • Communication: Expectations that company places, how privledged and how transparent?
    • Authenticity: Can people be themselves. Reduces stress
  • Technical onboarding: Needs to take time and do it properly. Allow new people to contribute back to it and make it better.
    • Pick out easy wins or low-hanging fruit so peopel can contribute sooner
    • Have Style Guides and good docs
  • MetaMorphosis
    • Senior Engineers are fully Contributing

Katie McLaughlin – Being kind to 3am you

Share

DevOpsDays NZ 2019 – Day 1 – Session 3

Gleidson Nascimento – Packaging OpenShift Origin Kubernetes Distribution (OKD)

  • Centos SIG
  • Based on latest upstream

Joshua King – Don’t Reinvent the Wheel, Just Realign It

  • Project: Let notifications work for powershell users
  • Then he found the UWP community toolkit
  • Which had notifications built-in
  • These days looks around first, asks for APIs rather than scraping
  • Look around for open-source tools and give back
  • Sometimes your implimentation might be fun or even better than the original

Srdan Dukic – Implicit trust agreement in Learning Organizations

  • Sysadmin shell -> ansible -> APIs -> automate everything
  • Programmers coded themselves out of a job
  • Followup instructions or achieve results?
  • A bit of both – tension between the two
  • Money today or Money tomorrow?
  • Employee – Expected to make things better
  • Employer – Support things getting better, not fire people when they automate themselves out of a job

Julie Gunderson – You Can’t Buy DevOps

  • Lots of companies talking about DevOps are trying to sell you a solution
  • What doesn’t makes you a devops company
    • Be in the Cloud
    • Have a DevOps team
    • Get rid of the Ops Team
    • A checklist you can tick off
    • Easy
  • Westrum 3 Cultures Model
  • We want the generative model
  • Keeping information flowing between teams is prerequisite for high performance teams
  • Psychological Safety to make decisions. Lets employees focus on problems and getting work done rather than politics
  • Practices
    • Configuration management
    • CICD Pipelines
    • Work in small batches
    • Test every commit and everything else (look at Chaos engineering)
  • Tools
    • Let the teams who are using the tools decide on what tools they will use
    • XebiaLabs Periodic table of DevOps tools
  • Getting there
    • Start with one team and a POC

Share

DevOpsDays NZ 2019 – Day 1 – Session 2

Allen Geer, Michael Harrod – Kiwi Ingenuity – Kiwi’s can Overcome Tough Problems In DevOps

  • Contrast – US vs NZ
    • In the US companies are bigger, lots more people, lots more money to throw at problems.
    • Contrast with Arial Topdressing pioneered in NZ using surplus WW2 aircraft
    • Since the problems are up to 100x bigger in the US the tools are designed for that scale. ROI might not not be there for smaller companies.
    • Dealing with Scale
      • Avoid “Shinny new thing” syndrom, plan for keeping things for at least 5 years.
      • Ramp up slowly with the tool, push it into other areas.
      • Avoid Single Person Silo.
      • Bring up some Kiwi Inginuity (Look at Open source, Use the Free Tiers or Cheap Tiers).
      • Out-Innovate the US companies rather than trying to out-scale
    • Infrastructure: Monetization of Toil
      • Spending time and money on stuff you can automate
      • Lots of manual creating of infrastructure, servers, firewalls.
      • Lack of incentive for providers who charge for changes to automate stuff
      • Other Providers will automate (especially overseas ones that will come into NZ)
      • People take risks (eg no DR) in order to save money.
      • Innovator’s Dilemma
    • Solutions
      • More vocal customers
      • Providers should provider a platform, lots more self-service. Ahnd-holding for the hard stuff not the day-to-day
      • Charge for outcomes not person-hours
      • Begin Small
      • It’s an experiment – Freedom to Fail
    • Inattentive Customer Service
      • Overseas companies have a lot more forums, helpdesks, quick responses.
      • “Kiwis reluctant to make a fuss” , Companies not used to people making a fuss
      • Apply “American Ingenuity” – Striving focus to increase customer satisfaction.
      • Build a healthy community (eg online forums) around your service.
      • Gather insights from customers
      • Bezos – “When a customer contacts us, we see this as a defect” . Focus on the source of problems
    • Evolving Kiwi Workforce
      • NZ has older and aging workforce. 2nd oldest in the OECD
      • Slightly Fewer peoples with degrees
      • 11% of workforce 65+ by 2038
    • Learning in the workplace
      • Leverage senior Knowledge
      • Telco – Older customers didn’t want to approach young workers in mall. Brought in retired engineers to work in stores.
      • Mentoring and reverse-mentoring. Mentor learns insights from mentoree too (eg about younger people’s habits)
    • Introducing people to DevOps
      • Kiwi DevOps models

Craig Box – Teaching Old Servers New Tricks: extending the service mesh outside the cluster

  • Service Mesh
    • Managing a service is hard
    • metrics, monitoring, logging, traceing
    • AAA encryption, certs
    • load balancing, routing, network policy
    • quota
    • Failure handling, fault inject
  • Microservices
    • Not just for hipsters
    • Works best at scale. Lots of devs
  • Now introduce a network in between everything. Lots of hard dtuff, distributed systems are hard
  • Leaky abstractions
    • Have to build stuff into microservice to deal with problems of the network
    • In multiple libraries and languages
    • Can we fix it?
  • Sidecar Pattern
    • The sidecar does all the hard stuff instead of making the microservice itself do it.
    • Talks TCP. Able to work with all languages
  • Proxies as sidecars
    • SPOF
    • Sidecar is attached to each MS
  • Flexability and Power
    • Single place where we can do everything
    • Traffic going in: TLS termination, metrics, quota
    • Traffic out of workloads: Authentication, TLS connections
  • Istio
    • Open platform
    • Not always microservices
    • Uniform observability
    • Operational Agility
    • Policy driven Security
  • How istio works
    • Proxies + control plane
    • Pilot in control plane pushes config to proxies, keeps track of them, looks up stuff in k8s cluster
    • Mixer – policy check and telemetry
    • Citidel – cert authority to proxies
    • Control plane has to run on k8s
    • Proxies run using envoy
    • Zipkin built in
    • All done automatically for kubernetes environments ( admission controller adds sidecar )
  • Adding a VM to a service mesh
    • Enable the mesh expansion, connect the networks
    • Add the gateway IP to the VM
    • Get a cert and copy to the VM
    • Install proxy and node agent
    • Traffic from cluster -> VM .
      • Add the service to DNS in the cluster,
      • Create a ServiceEntry on the cluster
    • Traffic from VM -> Cluster service
      • Add Service and IP to /etc/host on the VM
  • Sample Application – Hipster Shop
    • productcatalogservice is outside of kubernetes
    • headless service in kubernetes
    • manually created service entry in k8s
    • Experimental istio commands to simplify process to single command

Share

October 20, 2019

DevOpsDays NZ 2019 – Day 1 – Session 1

Brooke Treadgold – Back to Basics

  • Transformation Lead ANZ Bank
  • Not originally from a Tech background
  • Tech has a lot of buzzwords and acronyms that make it an exclusive club. Improvements relay on people from other parts of the business that aren’t in that club
  • These people have to care about it and understand it.
  • Had to use terms that everybody in the business understood and related to.
  • Case for change – What top orgs do:
    • 208 times more frequent deployments
    • 2604 times faster to recover from incidents
    • 7 times lower change failure rate
  • What you need
    • High Priority -> Access to people to do the work
    • Needed tangible goal (weekly releases) to get people to focus (and pay)
  • Making change a reality
    • Risk Management
      • You can just stop doing the reports
      • You need to gain their trust in order to get influence
      • Have to take them along the way with the changes
    • Empathy
    • Influence
  • History at ANZ
    • First pipeline replace just one document
      • Explained to change managment team how the pipeline could replace the traditional plan
    • Rethink of Change Plan and Outcome Reports
      • Other teams needed these for confidence in the change
      • Found out what people actually cared about, found better ways to provide that information (confidence) it an automated way
    • Security Assessment
      • Traditionally required a big document filled in and signed off
      • Found that this was only required for “Significant” changes
      • Got a definition of what significant means so didn’t need to do this.
    • High Risk Change Records
      • Lots of paperwork for High Risk changes
      • Decided that these are not high risk changes so lots less work
      • Templated them so a lot easier to do

Charles Korn – Dockerised local build and testing environments made easy

  • Go Script – Single script that a consistence place in all you repos that does the basic function. install, help, run, deploy
  • batect – tool he wrote
    • dockerized dev environment plus a Go Script
  • Dev environment
    • Build env: code to an artifact
    • Testing Environments. Fake stuff, lots of different levels
  • Build Environment
    • Container with the build tools. Mount our code directory into this
    • Isolation brings consistency and repeatability. No more “works on my machine”
    • Clean container every single time we run a build
    • CI agents just need docker since teams will provide the container
    • Ease of Onboarding. Just get git and docker installed
    • Ease of change. Environment and tasks defined in yaml and versioned like everything else. New version downloaded. Kept in sync with actual code
  • Test Environments
    • You can run local tests
    • Consistently runs test on CI
    • Have to launch multiple containers for more complex tests, using built in docker definitions and health checks and networking
  • Path to Production
    • If deploying docker then can use same image
    • But works with stuff that isn’t deployed as docker too
  • What about docker compose?
    • Better performance
    • Model – tasks are a first class citizen – Doesn’t feel like you are fighting the too.
    • Better UI and developer experience. Updates managed automatically
    • Cleans up better after each run
    • It just works. Works with proxies better. Works with file permissions better.
  • How to get started?
    • start small, work incrementally
    • Start with the build enviroment
    • With the Test env work though one piece at a time.
    • Reuse components
    • Take advantage for other people’s images. Lots of mocks for cloud services.
    • Docker has library of health check scripts
    • Bunch of sample scripts for batect
  • github.com/charleskorn/batect

Share

October 14, 2019

Network Maintenance

To my intense amazement, it seems that NBN Co have finally done sufficient capacity expansion on our local fixed wireless tower to actually resolve the evening congestion issues we’ve been having for the past couple of years. Where previously we’d been getting 22-23Mbps during the day and more like 2-3Mbps (or worse) during the evenings, we’re now back to 22-23Mbps all the time, and the status lights on the NTD remain a pleasing green, rather than alternating between green and amber. This is how things were way back at the start, six years ago.

We received an email from iiNet in early July advising us of the pending improvements. It said:

Your NBN™ Wireless service offers maximum internet speeds of 25Mbps downland and 5Mbps upload.

NBN Co have identified that your service is connected to a Wireless cell that is currently experiencing congestion, with estimated typical evening speeds of 3~6 Mbps. This congestion means that activities like browsing, streaming or gaming might have been and could continue to be slower than promised, especially when multiple people or devices are using the internet at the same time.

NBN Co estimates that capacity upgrades to improve the speed congestion will be completed by Dec-19.

At the time we were given the option of moving to a lower speed plan with a $10 refund because we weren’t getting the advertised speed, or to wait it out on our current plan. We chose the latter, because if we’d downgraded, that would have reduced our speed during the day, when everything was otherwise fine.

We did not receive any notification from iiNet of exactly when works would commence, nor was I ever able to find any indication of planned maintenance on iiNet’s status page. Instead, I’ve come to rely on notifications from my neighbour, who’s with activ8me. He receives helpful emails like this:

This is a courtesy email from Activ8me, Letting you know NBN will be performing Fixed Wireless Network capacity work in your area that might affect your connectivity to the internet. This activity is critical to the maintenance and optimisation of the network. The approximate dates of this maintenance/upgrade work will be:

Impacted location: Neika, TAS & Downstream Sites & Upstream Sites
NBN estimates interruption 1 (Listed Below) will occur between:
Start: 24/09/19 7:00AM End: 24/09/19 8:00PM
NBN estimates interruption 2 (Listed Below) will occur between:
Start: 25/09/19 7:00AM End: 25/09/19 8:00PM
NBN estimates interruption 3 (Listed Below) will occur between:
Start: 01/10/19 7:00AM End: 01/10/19 8:00PM
NBN estimates interruption 4 (Listed Below) will occur between:
Start: 02/10/19 7:00AM End: 02/10/19 8:00PM
NBN estimates interruption 5 (Listed Below) will occur between:
Start: 03/10/19 7:00AM End: 03/10/19 8:00PM
NBN estimates interruption 6 (Listed Below) will occur between:
Start: 04/10/19 7:00AM End: 04/10/19 8:00PM
NBN estimates interruption 7 (Listed Below) will occur between:
Start: 05/10/19 7:00AM End: 05/10/19 8:00PM
NBN estimates interruption 8 (Listed Below) will occur between:
Start: 06/10/19 7:00AM End: 06/10/19 8:00PM

Change start
24/09/2019 07:00 Australian Eastern Standard Time

Change end
06/10/2019 20:00 Australian Eastern Daylight Time

This is expected to improve your service with us however, occasional loss of internet connectivity may be experienced during the maintenance/upgrade work.
Please note that the upgrades are performed by NBN Co and Activ8me has no control over them.
Thank you for your understanding in this matter, and your patience for if it does affect your service. We appreciate it.

The astute observer will note that this is pretty close to two weeks of scheduled maintenance. Sure enough, my neighbour and I (and presumably everyone else in the area) enjoyed major outages almost every weekday during that period, which is not ideal when you work from home. But, like I said at the start, they did finally get the job done.

Interestingly, according to activ8me, there is yet more NBN maintenance scheduled from 21 October 07:00 ’til 27 October 21:00, then again from 28 October 07:00 ’til 3 November 21:00 (i.e. another two whole weeks). The only scheduled upgrade I could find listed on iiNet’s status page is CM-177373, starting “in 13 days” with a duration of 6 hours, so possibly not the same thing.

Based on the above, I am convinced that there is some problem with iiNet’s status page not correctly reporting NBN incidents, but of course I have no idea whether this is NBN Co not telling iiNet, iiNet not listening to NBN Co, or if it’s just that the status web page is busted.

LUV October 2019 Workshop: Ubuntu 19.10 Eoan Ermine

Oct 19 2019 12:30
Oct 19 2019 16:30
Oct 19 2019 12:30
Oct 19 2019 16:30
Location: 
Infoxchange, 33 Elizabeth St. Richmond

Ubuntu 19.10 Eoan Ermine

The latest version of Ubuntu Linux has been released!  Come along to learn what's new and try it out, or get help upgrading.  This version adds Raspberry Pi 4 support and experimental support for ZFS as a root filesystem.

The meeting will be held at Infoxchange, 33 Elizabeth St. Richmond 3121.  Late arrivals please call (0421) 775 358 for access to the venue.

LUV would like to acknowledge Infoxchange for the venue.

Linux Users of Victoria is a subcommittee of Linux Australia.

October 19, 2019 - 12:30

read more

October 09, 2019

AWS Welcomes Stewart

A little over a month ago now, I started a new role at Amazon Web Services (AWS) as a Principal Engineer with Amazon Linux. Everyone has been wonderfully welcoming and helpful. I’m excited about the future here, the team, and our mission.

Thanks to all my IBM colleagues over the past five and a half and a bit years too, I really enjoyed working with you on OpenPOWER and hope it continues to gain traction. I have my Blackbird now and am eagerly waiting for a spare 20 minutes to assemble it.

October 04, 2019

Linux Security Summit North America 2019: Videos and Slides

LSS-NA for 2019 was held in August in San Diego.  Slides are available at the Schedule, and videos of the talks may now be found in this playlist.

LWN covered the following presentations:

The new 3-day format (as previously discussed) worked well, and we’re expecting to continue this next year for LSS-NA.

Details on the 2020 event will be announced soon!

Announcements may be found on the event twitter account @LinuxSecSummit, on the linux-security-module mailing list, and via this very blog.

Announcing the DrupalSouth Diversity Scholarship

Over the years I have benefited greatly from the generosity of the Drupal Community. In 2011 people sponsored me to write lines of code to get me to DrupalCon Chicago.

Today Dave Hall Consulting is a very successful small business. We have contributed code, time and content to Drupal. It is time for us to give back in more concrete terms.

We want to help someone from an under represented group take their career to the next level. This year we will provide a Diversity Scholarship for one person to attend DrupalSouth, our 2 day Gettin’ Git training course and 5 nights at the conference hotel. This will allow this person to attend the premier Drupal event in the region while also learning everything there is to know about git.

To apply for the scholarship, fill out the form by 23:59 AEST 19 October 2019 to be considered. (Extended from 12 October)

The winner has been announced.

October 03, 2019

Installing LineageOS 16 on a Samsung SM-T710 (gts28wifi)

  1. Check the prerequisites
  2. Backup any files you want to keep
  3. Download LineageOS ROM and optional GAPPS package
  4. Copy LineageOS image & additional packages to the SM-T710
  5. Boot into recovery mode
  6. Wipe the existing installation.
  7. Format the device
  8. Install LineageOS ROM and other optional ROMs.

0 - Check the Prerequisites

  • The device already has the latest TWRP installed.
  • Android debugging is enabled on the device
  • ADB is installed on your workstation.
  • You have a suitably configured SD card as a back up handy.

I use this android.nix to ensure my NixOS environment has the prerequisites install and configured for it's side of the process.

1 - Backup any Files You Want to Keep

I like to use adb to pull the files from the device. There are also other methods available too.

$ adb pull /sdcard/MyFolder ./Downloads/MyDevice/

Usage of adb is documented at Android Debug Bridge

2 - Download LineageOS ROM and optional GAPPS package

I downloaded lineage-16.0-20191001-UNOFFICIAL-gts28wifi.zip from gts28wifi.

I also downloaded Open GApps ARM, nano to enable Google Apps.

I could have also downloaded and installed LineageOS addonsu and addonsu-remove but opted not to at this point.

3 - Copy LineageOS image & additional packages to the SM-T710

I use adb to copy the files files across:

$ adb push ./lineage-16.0-20191001-UNOFFICIAL-gts28wifi.zip /sdcard/
./lineage-16.0-20191001-UNOFFICIAL-gts28wifi.zip: 1 file pushed. 12.1 MB/s (408677035 bytes in 32.263s)
$ adb push ./open_gapps-arm-9.0-nano-20190405.zip /sdcard/
./open_gapps-arm-9.0-nano-20190405.zip: 1 file pushed. 11.1 MB/s (185790181 bytes in 15.948s)

I also copy both to the SD card at this point as the SM-T710 is an awful device to work with and in many random cases will not work with ADB. When this happens, I fall back to the SD card.

4 - Boot into recovery mode

I power the device off, then power it back into recovery mode by holding down [home]+[volume up]+[power].

5 - Wipe the existing installation

Press Wipe then Advanced Wipe.

Select:

  • Dalvik / Art Cache
  • System
  • Data
  • Cache

Swipe Swipe to Wipe at the bottom of the screen.

Press Back to return to the Advanced Wipe screen.

Press the triangular "back" button once to return to the Wipe screen.

6 - Format the device

Press Format Data.

Type yes and press blue check mark at the bottom-right corner to commence the format process.

Press Back to return to the Advanced Wipe screen.

Press the triangular "back" button twice to return to the main screen.

7 - Install LineageOS ROM and other optional ROMs

Press Install, select the images you wish to install and swipe make it go.

Reboot when it's completed and you should be off and running wtth a brand new LineageOS 16 on this tablet.

October 02, 2019

Percona Live Europe Amsterdam Day 1 notes

Percona Live Europe Amsterdam Day 1 was a bunch of fun, especially since I didn’t have to give a talk or anything since my tutorial was over on Day 0.

At lunch, I realised that there are a lot more fringe events happening around Percona Live… and if you’ve seen how people do “tech weeks”, maybe this is what the event ends up being – a show, plus plenty of focused satellite events. FOSDEM in the open source world totally gets this, and best of all, also lists fringe events (see example from 2019).

So, Thursday evening gets a few fringe events, a relatively short train ride away:

Anyway, what was Day 1 like? Keynotes started the morning, and I did make a Twitter thread. It is clear that there is a lot of talk amongst companies that make open source software, and companies in the ecosystem that somehow also derive great value from it. Some look at this as the great cloud vendors vs open source software vendors debate, but this isn’t necessarily always the case – we’ve seen this as Percona’s model too. And we’ve seen cloud companies contribute back (again, just like Percona). Guess this is a topic for a different post, because there are always two sides to this situation…

It is also clear that people want permissive open source licenses over anything source available. If you’re a CxO looking for software, it would be almost irresponsible to be using critical software that is just source available with a proprietary license. After all, what happens when the company decides to ask for more money? (Companies change ownership too, you know).

It is probably clear the best strategies are the “multi” (or hybrid) strategies. Multiple databases, multiple clouds, and going all in on open source to avoid vendor lock-in. Of course, don’t forget that open source software also can have “vendor lock-in” – always look at the health metrics of a project, vs. a product. We’re lucky in the MySQL ecosystem that we have not just the excellent work of Oracle, but also MariaDB Corporation / MariaDB Foundation and also Percona.

MySQL 8.0 adoption is taking off, with about 26% of the users on it. Those on MySQL 5.6 still seem to be on it, and there has been a decrease in 5.7 use to grow that 8.0 pie. It isn’t clear how these stats are generated (since there is no “phone home” functionality in MySQL; also the MariaDB Server variant doesn’t get as much usage as one would like), but maybe it is via download numbers?

Anyone paying any attention to MySQL 8 will know that they have switched to a “continuous delivery model”, also known as, you get new features in every point release. So the latest 8.0.18 gets EXPLAIN ANALYZE, and while we can’t try it yet (not released, and the documentation isn’t updated), I expect it will be fairly soon. I am eager to try this, because MariaDB Server has had ANALYZE since 10.1 (GA – Oct 2015). And it wasn’t long ago that MySQL received CHECK constraints support (8.0.16). Also the CLONE plugin in 8.0.17 warrants some checking/usage!

Besides all the hallway chats and meetings I did manage to get into a few sessions… Rakuten Intelligence talked about their usage of ProxySQL, and one thing was interesting with regard to their future plans slide – they do consider group replication but they wonder what would replace their custom HA software? But most importantly they wonder if it is stable and which companies have successfully deployed it, because they don’t want to be the first. Question from the floor about Galera Cluster came up, and they said they had one app that required XA support – looks like something to consider once Galera 4 is fully baked!

The PXC–8 talk was also chock full of information, delivered excellently, and something to try soon (it wasn’t quite available yesterday, but today I see a release announcement: Experimental Binary of Percona XtraDB Cluster 8.0).

I enjoyed the OpenCorporates use case at the end too. From the fact that for them, being on-premise would be cheaper than the cloud, how they use ProxySQL, Galera Cluster branch Percona XtraDB Cluster (PXC), and ZFS. ZFS is not the most common filesystem for MySQL deployments, so it was interesting to see what could be achieved.

Then there was the Booking.com party and boy, did they outdo themselves. We had a menu, multi-course meal with wine pairings, and a lot of good conversation. A night wouldn’t be complete without some Salmiakkikossu, and Monty sent some over for us to enjoy.

Food at the Hilton has been great too (something I would never really want to say, considering I’m not a fan of the hotel chain) – even the coffee breaks are well catered for. I think maybe this has been the best Percona Live in terms of catering, and I’ve been to a lot of them (maybe all…). I have to give much kudos to Bronwyn and Lorraine at Percona for the impeccable organisation. The WiFi works a charm as well. On towards Day 2!

September 30, 2019

ProxySQL Technology Day Ghent 2019

Just delivered a tutorial on MariaDB Server 10.4. Decided to take a closer look at the schedule for Percona Live Europe Amsterdam 2019 and one thing is clear: feels like there should also be a ProxySQL tutorial, largely because at mine, I noticed like 20% of the folk saying they use it.

Seems like there are 2 talks about it though, one about real world usage on Oct 1, and one about firewall usage with AWS, given by Marco Tusa on Oct 2.

Which led me to the ProxySQL Technology Day 2019 in Ghent, Belgium. October 3 2019. 2 hour train ride away from Amsterdam Schipol (the airport stop). It is easy to grab a ticket at Schipol Plaza, first class is about €20 more per way than second class, and a good spot to stay could be the Ibis Budget Dampoort (or the Marriott Ghent). Credit card payments accepted naturally, and I’m sure you can also do this online. Didn’t take me longer than five minutes to get all this settled.

So, the ProxySQL Technology Day is free, seems extremely focused and frankly is refreshing because you just learn about one thing! I feel like the MySQL world misses out on this tremendously as we lost the users conference… Interesting to see if this happens more in our ecosystem!

September 28, 2019

Using pipefail with shell module in Ansible

If you’re using the shell module with Ansible and piping the output to another command, it might be a good idea to set pipefail. This way, if the first command fails, the whole task will fail.

For example, let’s say we’re running this silly task to look for /tmp directory and then trim the string “tmp” from the result.

ansible all -i "localhost," -m shell -a \
'ls -ld /tmp | tr -d tmp'

This will return something like this, with a successful return code.

localhost | CHANGED | rc=0 >>
drwxrwxrw. 26 roo roo 640 Se 28 19:08 /

But, let’s say the directory doesn’t exist, what would the result be?

ansible all -i "localhost," -m shell -a \
'ls -ld /tmpnothere | tr -d tmp'

Still success because of the pipe to trim was successful, even though we can see the ls command failed.

localhost | CHANGED | rc=0 >>
ls: cannot access ‘/tmpnothere’: No such file or directory

This time, let’s set pipefail first.

ansible all -i "localhost," -m shell -a \
'set -o pipefail && ls -ld /tmpnothere | tr -d tmp'

This time it fails, as expected.

localhost | FAILED | rc=2 >>
ls: cannot access ‘/tmpnothere’: No such file or directorynon-zero return code

If /bin/sh on the remote node does not point to bash then you’ll need to pass in an argument specifying bash as the executable to use for the shell task.

  - name: Silly task
   shell: set -o pipefail && ls -ld /tmp | tr -d tmp
   args:
     executable: /usr/bin/bash

Ansible lint will pick these things up for you, so why not run it across your code 😉

September 27, 2019

If I Understood You, Would I Have This Look on My Face?

Share

This book discusses science and technical communication from the perspective of someone who comes from professional theatre and acting. Alan explains how his accidental discovery of the application of theatre sports to communication created an opportunity to teach technical communicators how to be more effective. Essentially, the argument is that empathy is essential to communication — you need to be able to understand where your audience is starting and and where they’re likely to get stuck before you can take them on the journey.

Unsurprisingly given the topic of the book, this is a well written and engaging read. The book is nicely structured and uses regular anecdotes (some of them humorous) to get its message across.

A detailed and fun read.

If I Understood You, Would I Have This Look on My Face? Book Cover If I Understood You, Would I Have This Look on My Face?
Alan Alda
Self-Help
Random House
June 6, 2017
240

NEW YORK TIMES BESTSELLER • Award-winning actor Alan Alda tells the fascinating story of his quest to learn how to communicate better, and to teach others to do the same. With his trademark humor and candor, he explores how to develop empathy as the key factor. “Invaluable.”—Deborah Tannen, #1 New York Times bestselling author of You’re the Only One I Can Tell and You Just Don’t Understand Alan Alda has been on a decades-long journey to discover new ways to help people communicate and relate to one another more effectively. If I Understood You, Would I Have This Look on My Face? is the warm, witty, and informative chronicle of how Alda found inspiration in everything from cutting-edge science to classic acting methods. His search began when he was host of PBS’s Scientific American Frontiers, where he interviewed thousands of scientists and developed a knack for helping them communicate complex ideas in ways a wide audience could understand—and Alda wondered if those techniques held a clue to better communication for the rest of us. In his wry and wise voice, Alda reflects on moments of miscommunication in his own life, when an absence of understanding resulted in problems both big and small. He guides us through his discoveries, showing how communication can be improved through learning to relate to the other person: listening with our eyes, looking for clues in another’s face, using the power of a compelling story, avoiding jargon, and reading another person so well that you become “in sync” with them, and know what they are thinking and feeling—especially when you’re talking about the hard stuff. Drawing on improvisation training, theater, and storytelling techniques from a life of acting, and with insights from recent scientific studies, Alda describes ways we can build empathy, nurture our innate mind-reading abilities, and improve the way we relate and talk with others. Exploring empathy-boosting games and exercises, If I Understood You is a funny, thought-provoking guide that can be used by all of us, in every aspect of our lives—with our friends, lovers, and families, with our doctors, in business settings, and beyond. “Alda uses his trademark humor and a well-honed ability to get to the point, to help us all learn how to leverage the better communicator inside each of us.”—Forbes “Alda, with his laudable curiosity, has learned something you and I can use right now.”—Charlie Rose

Share

September 26, 2019

Talking with WP&UP

At WordCamp Europe this year, I had the opportunity to chat with the folks at WP&UP, who are doing wonderful work providing mental health support in the WordPress community.

Listen to the podcast, and check out the services that WP&UP provide!

September 21, 2019

Restricting third-party iframe widgets using the sandbox attribute, referrer policy and feature policy

Adding third-party embedded widgets on a website is a common but potentially dangerous practice. Thankfully, the web platform offers a few controls that can help mitigate the risks. While this post uses the example of an embedded SurveyMonkey survey, the principles can be used for all kinds of other widgets.

Note that this is by no means an endorsement of SurveyMonkey's proprietary service. If you are looking for a survey product, you should consider a free and open source alternative like LimeSurvey.

SurveyMonkey's snippet

In order to embed a survey on your website, the SurveyMonkey interface will tell you to install the following website collector script:

<script>(function(t,e,s,n){var
o,a,c;t.SMCX=t.SMCX||[],e.getElementById(n)||(o=e.getElementsByTagName(s),a=o[o.length-1],c=e.createElement(s),c.type="text/javascript",c.async=!0,c.id=n,c.src=["https:"===location.protocol?"https://":"http://","widget.surveymonkey.com/collect/website/js/tRaiETqnLgj758hTBazgd9NxKf_2BhnTfDFrN34n_2BjT1Kk0sqrObugJL8ZXdb_2BaREa.js"].join(""),a.parentNode.insertBefore(c,a))})(window,document,"script","smcx-sdk");</script><a
style="font: 12px Helvetica, sans-serif; color: #999; text-decoration:
none;" href=https://www.surveymonkey.com> Create your own user feedback
survey </a>

which can be rewritten in a more understandable form as:

(
function (s) {
  var scripts, last_script, new_script;
  window.SMCX = window.SMCX || [],
  document.getElementById("smcx-sdk") ||
    (
      scripts = document.getElementsByTagName("script"),
      last_script = scripts[scripts.length - 1],
      new_script = document.createElement("script"),
      new_script.type = "text/javascript",
      new_script.async = true,
      new_script.id = "smcx-sdk",
      new_script.src =
        [
          "https:" === location.protocol ? "https://" : "http://",
          "widget.surveymonkey.com/collect/website/js/tRaiETqnLgj758hTBazgd9NxKf_2BhnTfDFrN34n_2BjT1Kk0sqrObugJL8ZXdb_2BaREa.js"
        ].join(""),
      last_script.parentNode.insertBefore(new_script, last_script)
    )
  }
)();

The fact that this adds a third-party script dependency to your website is problematic because it means that a security vulnerability in their infrastructure could lead to a complete compromise of your site, thanks to third-party scripts having full control over your website. Security issues aside though, this could also enable this third-party to violate your users' privacy expectations and extract any information displayed on your site for marketing purposes.

However, if you embed the snippet on a test page and inspect it with the developer tools, you will find that it actually creates an iframe:

<iframe
    width="500"
    height="500"
    frameborder="0"
    allowtransparency="true"
    src="https://www.surveymonkey.com/r/D3KDY6R?embedded=1"
></iframe>

and you can use that directly on your site without having to load their script.

Mixed content anti-pattern

As an aside, the script snippet they propose makes use of a common front-end anti-pattern:

"https:"===location.protocol?"https://":"http://"

This is presumably meant to avoid inserting an HTTP script element into an HTTPS page, since that would be considered mixed content and get blocked by browsers, however this is entirely unnecessary. One should only ever use the HTTPS version of such scripts anyways since an HTTP page never prohibits embedding HTTPS content.

In other words, the above code snippet can be simplified to:

"https://"

Restricting iframes

Thanks to defenses which have been added to the web platform recently, there are a few things that can be done to constrain iframes.

Firstly, you can choose to hide your full page URL from SurveyMonkey using the referrer policy:

referrerpolicy="strict-origin"

This mean seem harmless, but page URLs sometimes include sensitive information in the URL path or query string, for example, search terms that a user might have typed. The strict-origin policy will limit the referrer to your site's hostname, port and protocol.

Secondly, you can prevent the iframe from being able to access anything about its embedding page or to trigger popups and unwanted downloads using the sandbox attribute:

sandbox="allow-scripts allow-forms"

Ideally, the contents of this attribute would be empty so that all restrictions would be active, but SurveyMonkey is a JavaScript application and it of course needs to submit a form since that's the purpose of the widget.

Finally, a new experimental capability is making its way into browsers: feature policy. In the context of untrusted iframes, it enables developers to explicitly disable certain powerful features:

allow="accelerometer 'none';
       ambient-light-sensor 'none';
       camera 'none';
       display-capture 'none';
       document-domain 'none';
       fullscreen 'none';
       geolocation 'none';
       gyroscope 'none';
       magnetometer 'none';
       microphone 'none';
       midi 'none';
       payment 'none';
       usb 'none';
       vibrate 'none';
       vr 'none';
       webauthn 'none'"

Putting it all together, we end up with the following HTML snippet:

<iframe
    width="500"
    height="500"
    frameborder="0"
    allowtransparency="true"
    allow="accelerometer 'none'; ambient-light-sensor 'none';
           camera 'none'; display-capture 'none';
           document-domain 'none'; fullscreen 'none';
           geolocation 'none'; gyroscope 'none'; magnetometer 'none';
           microphone 'none'; midi 'none'; payment 'none'; usb 'none';
           vibrate 'none'; vr 'none'; webauthn 'none'"
    sandbox="allow-scripts allow-forms"
    referrerpolicy="strict-origin"
    src="https://www.surveymonkey.com/r/D3KDY6R?embedded=1"
></iframe>

Content Security Policy

Another advantage of using the iframe directly is that instead of loosening your site's Content Security Policy by adding all of the following:

  • script-src https://www.surveymonkey.com
  • img-src https://www.surveymonkey.com
  • frame-src https://www.surveymonkey.com

you can limit the extra directives to just the frame controls:

  • frame-src https://www.surveymonkey.com

CSP Embedded Enforcement would be another nice mechanism to make use of, but looking at SurveyMonkey's CSP policy:

Content-Security-Policy:
  default-src https: data: blob: 'unsafe-eval' 'unsafe-inline'
      wss://*.hotjar.com 'self';
  img-src https: http: data: blob: 'self';
  script-src https: 'unsafe-eval' 'unsafe-inline' http://www.google-analytics.com http://ajax.googleapis.com
      http://bat.bing.com http://static.hotjar.com http://www.googleadservices.com
      'self';
  style-src https: 'unsafe-inline' http://secure.surveymonkey.com 'self';
  report-uri https://csp.surveymonkey.com/report?e=true&c=prod&a=responseweb

it allows the injection of arbitrary Flash files, inline scripts, evals and any other scripts hosted on an HTTPS URL, which means that it doesn't really provide any meaningful security benefits.

Embedded enforcement is thefore not a usable security control in this particular example until SurveyMonkey gets a stricter CSP policy.

September 20, 2019

Running a non-root container on Fedora with podman and systemd (Home Assistant example)

Similar to my post about running Home Assistant on Fedora in Docker, this is about using podman instead and integrating the container as a service with systemd. One of the major advantages to me is the removal of Docker daemon and integration with the rest of the system including management of dependencies like regular services.

This assumes you’ve just installed Fedora server and have a local user with sudo privileges. Let’s also install some SELinux tools.

sudo dnf install -y /usr/sbin/semanage

Create non-root user

Let’s create a specific user to run the Home Assistant service.

We could create a regular user (and remove password expiry settings), but as this is a service let’s create a system account even though it’s a bit more tricky.

sudo useradd -r -m -d /var/lib/hass hass

As this is a system account, we’ll need to manually specify sub user and group ids that the account is allowed to use inside the container. We work out what range is available by looking at /etc/subuid and /etc/subgid files on the host, ideally UID and GID should be the same.

NEW_SUBUID=$(($(tail -1 /etc/subuid |awk -F ":" '{print $2}')+65536))
NEW_SUBGID=$(($(tail -1 /etc/subgid |awk -F ":" '{print $2}')+65536))
sudo usermod \
--add-subuids  ${NEW_SUBUID}-$((${NEW_SUBUID}+65535)) \
--add-subgids  ${NEW_SUBGID}-$((${NEW_SUBGID}+65535)) \
hass

Inside the hass user’s home directory, create a config directory to store configuration files and ssl to store SSL certificates. These will be mapped into the container as /config and /ssl respectively. We will also set the appropriate SELinux context so that the directories can be accessed in the container.

sudo -H -u hass bash -c "mkdir ~/{config,ssl}"
sudo semanage fcontext -a -t user_home_dir_t "/var/lib/hass(/.+)?"
sudo semanage fcontext -a -t svirt_sandbox_file_t \
"/var/lib/hass/((config)|(ssl))(/.+)?"
sudo restorecon -Frv /var/lib/hass

Pull the container image

Now that we have the basic home directly in place, we can switch to the hass user with sudo.

sudo su - hass

As the hass user, let’s use podman to download and run the official Home Assistant image in a container.

First, pull the container which is stored under the non-root user’s ~/.local/share/containers/ directory. Note the latest tag on the end of the image name specifies the version to run. While this is not necessary if you’re getting the latest (as it’s the default), if you want a specific release simply replace latest with the version you want (see their Docker hub page for available releases). Specifying latest means we’ll get the latest release of the container at the time.

podman pull \
docker.io/homeassistant/home-assistant:latest

Manually start the container

Now we can spin up a container using that image. Note that we’re passing in the config and ssl (as read only) directories we created earlier and using host networking to open the required ports on the host.

podman run -dt \
--name=hass \
-v /var/lib/hass/config:/config \
-v /var/lib/hass/ssl:/ssl:ro \
-v /etc/localtime:/etc/localtime:ro \
--net=host \
docker.io/homeassistant/home-assistant:latest

Similar to Docker, you can look at the status of the container and manage it with podman, including getting the logs if you require them.

podman ps -a
podman logs hass
podman restart hass

To get a temporary shell on the container, execute bash.

podman exec -it hass /bin/bash

Inside the container, take a look at the passed-in /config directory (do anything else you want) and then exit when you’re done.

ls -l /config
id
echo "I am root in the container"
exit

Once the container is up and running, the Home Assistant port should be listening on the host.

ss -ltn |grep 8123

Manually destroy the container

Next we’ll create a service to manage this, so for now you can stop and delete this container (this does not delete the image we downloaded). Do this as the hass user still, then exit to return to your regular user.

podman stop hass
podman rm hass
podman ps -a
exit

Configuring the firewall

Home Assistant runs on port 8123, so we will need to open this port on the firewall (back as your regular user).

sudo firewall-cmd --get-active-zones
sudo firewall-cmd --zone=FedoraServer --add-port=8123/tcp

You can test this by using your web browser to connect to the IP address of your machine on port 8123 from another machine on your network.

If that works, make the firewall change permanent.

sudo firewall-cmd --runtime-to-permanent

Create service for the container

Now that we have the container that works, let’s create a systemd service to manage it. This will auto start the container on boot and allow us to manage it as a regular service, including any dependencies. This service stops, removes and starts a new container every time.

Note the Exec lines which will delete and restart the container from the image. As per the manual command above, to run a specific version replace latest with an available tagged release.

cat &amp;lt;&amp;lt; EOF | sudo tee /etc/systemd/system/hass.service
[Unit]
Description=Home Assistant in Container
After=network.target

[Service]
User=hass
Group=hass
Type=simple
TimeoutStartSec=5m
ExecStartPre=-/usr/bin/podman rm -f "hass"
ExecStart=podman run --name=hass -v /var/lib/hass/ssl:/ssl:ro -v /var/lib/hass/config:/config -v /etc/localtime:/etc/localtime:ro --net=host docker.io/homeassistant/home-assistant:latest
ExecReload=-/usr/bin/podman stop "hass"
ExecReload=-/usr/bin/podman rm "hass"
ExecStop=-/usr/bin/podman stop "hass"
Restart=always
RestartSec=30

[Install]
WantedBy=multi-user.target
EOF

Reload the systemd daemon is required to pick up the new file.

sudo systemctl daemon-reload

Manage the container with systemd

Let’s see if we can restart the container and check its status. Because it is now managed by systemd, we can check the log with journalctl.

sudo systemctl restart hass
sudo systemctl status hass
sudo journalctl -u hass

Once you’re happy, we can enable the service.

sudo systemctl enable hass

Now is probably a good time to reboot your machine and make sure that the service comes up fine on boot.

Configuring Home Assistant

After rebooting, you should be able to browse to the Home Assistant port on your machine.

Now that you have Home Assistant running, modify the configuration as you please by editing the configuration file under the hass user home directory.

If you make a change, you can simply restart the service.

Updating the container

To update the container, switch to the hass user again and pull a newer version of the container. We can see the newer version of the image with podman and if you want to you can inspect the image for more details.

podman pull docker.io/homeassistant/home-assistant:latest
podman images -a
podman inspect docker.io/homeassistant/home-assistant:latest

Now you can restart the container as your regular user.

sudo systemctl restart hass
sudo journalctl -uf hass.service

Conclusion

Anyway, that’s an example of how you could do it with something like Home Assistant. It can be modified accordingly for any other container you might want to run.

September 18, 2019

Sadly leaving the NSW Government

This week was sadly my last week with the NSW Government, Department of Customer Service, formerly the Department of Finance, Services and Innovation. I am sad to be leaving such an exciting place at such an exciting time, but after 12 months of commuting from Canberra to Sydney. The hardest part of working in the NSW Government has been, by far, the commute. I have been leaving my little family every week for 3, 4 or 5 days, and although we have explored possibilities to move, my family and I have to continue living in Canberra for the time being. It has got to the point where my almost 4 year old has asked me to choose her over work, a heart breaking scenario as many will understand. 

I wanted to publicly thank everyone I worked with, particularly my amazing teams who have put their heart, soul and minds to the task of making exceptional public services in an exceptional public sector. I am really proud of the two Branches I had the privilege and delight to lead, and I know whatever comes next, that those 160 or so individuals will continue to do great things wherever they go. 

I remain delighted and amazed at the unique opportunity in NSW Government to lead the way for truly innovative, holistic and user centred approaches to government. The commitment and leadership from William Murphy, Glenn King, Greg Wells, Damon Rees, Emma Hogan, Tim Reardon, Annette O’Callaghan, Michael Coutts-Trotter (and many others across the NSW Government senior executive) genuinely to my mind, has created the best conditions anywhere in Australia (and likely the world!) to make great and positive change in the public service.

I want to take a moment to also directly thank Martin Hoffman, Glenn, Greg, William, Amanda Ianna and all those who have supported me in the roles, as well as everyone from my two Branches over that 12 months for their support, belief and commitment. It has been a genuine privilege and delight to be a part of this exceptional department, and to see the incredible work across our Branches.

I have only been in the NSW Government for 12 months, and in that time was the ED for Digital Government Policy and Innovation for 9 months, and then ED Data, Insights and Transformation for a further 3 months.

In just 9 months, the Digital Government Policy and Innovation team achieved a lot in the NSW Government digital space, including:

  • Australia’s first Policy Lab (bringing agile test driven and user centred design methods into a traditional policy team),
  • the Digital Government Policy Landscape (mapping all digital gov policies for agencies) including IoT & a roadmap for an AI Ethics Framework and AI Strategy,
  • the NSW Government Digital Design Standard and a strong community of practice to contribute and collaborate
  • evolution of the Digital NSW Accelerator (DNA) to include delivery capabilities,
  • the School Online Enrolment system,
  • an operational and cross government Life Journeys Program (and subsequent life journey based navigators),
  • a world leading Rules as Code exemplars and early exploration of developing human and machine readable legislation from scratch(Better Rules),
  • establishment of a digital talent pool for NSW Gov,
  • great improvements to data.nsw and whole of government data policy and the Information Management Framework,
  • capability uplift across the NSW public sector including the Data Champions network and digital champions,
  • a prototype whole of government CX Pipeline,
  • the Innovation NSW team were recognised as one of Apolitical’s 100+ teams teaching government the skills of the future with a range of Innovation NSW projects including several Pitch to Pilot events, Future Economy breakfast series,
  • and the improvements to engagement/support we provided across whole of government.

For the last 3 months I was lucky to lead the newly formed and very exciting Data, Insights and Transformation Branch, which included the Data Analytics Centre, the Behavioural Insights Unit, and a new Transformation function to explore how we could design a modern public service fit for the 21st century. In only 3 months we

  • established a strong team culture, developed a clear cohesive work program, strategic objectives and service offerings,
  • chaired the ethics board for behavioural insights projects, which was a great experience, and
  • were seeing new interest, leads and engagement from agencies who wanted to engage with the Data Analytics Centre, Behavioural Insights Unit or our new Transformation function.

It was wonderful to work with such a fantastic group of people and I learned a lot, including from the incredible leadership team and my boss, William Murphy, who shared the following kind words about my leaving:

As a passionate advocate for digital and transformative approaches to deliver great public services, Pia has also been working steadily to deliver on whole-of-government approaches such as Government as a Platform, service analytics and our newly formed Transformation agenda to reimagine government.

Her unique and effective blend of systems thinking, technical creativity and vision will ensure the next stage in her career will be just as rewarding as her time with Customer Service has been.

Pia has made the difficult decision to leave Customer Service to spend more time with her Canberra-based family.

The great work Pia and her teams have done over the last twelve months has without a doubt set up the NSW digital and customer transformation agenda for success.

I want to thank her for the commitment and drive she has shown in her work with the NSW Government, and wish her well with her future endeavours. I’m confident her focus on building exceptional teams, her vision for NSW digital transformation and the relationships she has built across the sector will continue.

For my part, I’m not sure what will come next, but I’m going to have a holiday first to rest, and probably spend October simply writing down all my big ideas and doing some work on rules as code before I look for the next adventure.

Deploying TT-RSS on NixOS

NixOS Gears by Craige McWhirter

Deploying a vanilla Tiny Tiny RSS server on NixOS via NixOps is fairly straight forward.

My preferred method is to craft a tt-rss.nix file describes the configuration of the TT-RSS server.

tt-rss.nix:

{ config, pkgs, lib, ... }:

{

  services.tt-rss = {
    enable = true;                                # Enable TT-RSS
    database = {                                  # Configure the database
      type = "pgsql";                             # Database type
      passwordFile = "/run/keys/tt-rss-dbpass";   # Where to find the password
    };
    email = {
      fromAddress = "news@mydomain";              # Address for outgoing email
      fromName = "News at mydomain";              # Display name for outgoing email
    };
    selfUrlPath = "https://news.mydomain/";       # Root web URL
    virtualHost = "news.mydomain";                # Setup a virtualhost
  };

  services.postgresql = {
    enable = true;                # Ensure postgresql is enabled
    authentication = ''
      local tt_rss all ident map=tt_rss-users
    '';
    identMap =                    # Map the tt-rss user to postgresql
      ''
        tt_rss-users tt_rss tt_rss
      '';
  };

  services.nginx = {
    enable = true;                                          # Enable Nginx
    recommendedGzipSettings = true;
    recommendedOptimisation = true;
    recommendedProxySettings = true;
    recommendedTlsSettings = true;
    virtualHosts."news.mydomain" = {                        # TT-RSS hostname
      enableACME = true;                                    # Use ACME certs
      forceSSL = true;                                      # Force SSL
    };
  };

  security.acme.certs = {
      "news.mydomain".email = "email@mydomain";
  };

}

This line from the above file should stand out:

              passwordFile = "/run/keys/tt-rss-dbpass";   # Where to find the password

The passwordFile option requires that you use a secrets file with NixOps.

Where does that file come from? It's pulled from a secrets.nix file (example) that for this example, could look like this:

secrets.nix:

{ config, pkgs, ... }:

{
  deployment.keys = {
    # Database key for TT-RSS
    tt-rss-dbpass = {
      text        = "vaetohH{u9Veegh3caechish";   # Password, generated using pwgen -yB 24
      user        = "tt_rss";                     # User to own the key file
      group       = "wheel";                      # Group to own the key file
      permissions = "0640";                       # Key file permissions
    };

  };
}

The file's path /run/keys/tt-rss-dbpass is determined by the elements. So deployment.keys determines the initial path of /run/keys and the next element tt-rss-dbpass is a descriptive name provided by the stanza's author to describe the key's use and also provide the final file name.

Now that we have described the TT-RSS service in tt-rss_for_NixOps.nix and the required credentials in secrets.nix we need to pull it all together for deployment. We achieve that in this case by importing both these files into our existing host definition:

myhost.nix:

    {
      myhost =
        { config, pkgs, lib, ... }:

        {

          imports =
            [
              ./secrets.nix                               # Import our secrets
              ./servers/tt-rss_for_NixOps.nix              # Import TT-RSS description
            ];

          deployment.targetHost = "192.168.132.123";   # Target's IP address

          networking.hostName = "myhost";              # Target's hostname.
        };
    }

To deploy TT-RSS to your NixOps managed host, you merely run the deploy command for your already configured host and deployment, which would look like this:

    $ nixops deploy -d MyDeployment --include myhost

You should now have a running TT-RSS server and be able to login with the default admin user (admin: password).

In my nixos-examples repo I have a servers directory with some example files and a README with information and instructions. You can use two of the files to generate a TT-RSS VM to take a quick poke around. There is also an example of how you can deploy TT-RSS in production using NixOps, as per this post.

If you wish to dig a little deeper, I have my production deployment over at mio-ops.

September 17, 2019

LUV October 2019 Main Meeting and AGM

Oct 1 2019 19:00
Oct 1 2019 21:00
Oct 1 2019 19:00
Oct 1 2019 21:00
Location: 
Kathleen Syme Library, 251 Faraday Street Carlton VIC 3053

NOTE: The library closes at 7pm so arrivals after that time will need to contact Andrew on (0421) 775 358 or any other attendee for admission.

Speakers:

Many of us like to go for dinner nearby after the meeting, typically at Brunetti's or Trotters Bistro in Lygon St.  Please let us know if you'd like to join us!

Linux Users of Victoria is a subcommittee of Linux Australia.

October 1, 2019 - 19:00

read more

Software Freedom Day 2019

Sep 21 2019 13:00
Sep 21 2019 18:00
Sep 21 2019 13:00
Sep 21 2019 18:00
Location: 
Electron Workshop, 31 Arden Street North Melbourne 3051

It's time once again to get excited about all the benefits that Free and Open Source Software have given us over the past year and get together to talk about how Freedom and Openness can improve our human rights, our privacy, our security and our communities. It's Software Freedom Day!

Linux Users of Victoria is a subcommittee of Linux Australia.

September 21, 2019 - 13:00

read more

September 16, 2019

FreeDV 2020 over the QO-100 Satellite

Gerhard (OE3GBB), Steve (K5OKC) and I have been working on FreeDV 2020 over the Es’hail 2/QO-100 satellite. This satellite is in geosynchronous orbit and has a linear transponder. It’s designed for SSB so has a narrow bandwidth which rules out most digital voice modes – except FreeDV. For example FreeDV 2020 can send 8 kHz wide speech over just 1600 Hz of RF bandwidth, A linear amplifier also means the OFDM waveforms used by FreeDV will pass OK, as long as your transmit system is linear.

Modem Mods

Gerhard’s initial experiments showed that FreeDV 1600 worked well, but FreeDV 2020 was breaking up and losing sync. We guessed that this was due to significant phase noise on the channel, from the many up and down conversion steps and the transponder itself. Fortunately the SNR was quite high.

Steve and I modified the OFDM modem used for FreeDV 2020 to handle this. This modem had been designed for coherent demodulation on very low SNR HF fading channels. The phase tracking was designed for HF channels with a bandwidth of a few Hz. As a first step we added a high bandwidth option, then moved to differential demodulation. This allows us to handle faster phase shifts (i.e. more phase noise), at the expense of reduced low SNR performance. This is an acceptable trade off for this channel as we have plenty of SNR.

Gerhard also had some set-up problems getting everything to run on one machine – FreeDV 2020 likes a powerful, modern CPU due to the LPCNet codec.

I now have the rather complex Windows build process for FreeDV fully scripted (thanks Richard and Danilo). This means I can develop on Linux, then run a Docker script that rebuilds everything for Windows, packages it in an installer, and pops it up on my web site. Remarkably, it then produces the same results as the Linux version. This take a a lot of pain out of my life, and makes it easy for others to test innovations rapidly.

Here is a sample of the decoded audio, from a QSO between Gerhard and Dani (EA4GPZ):

The quality is quite high, and through a nice set of speakers the wide, 8kHz audio bandwidth is very pleasant. However I can hear some frame rate modulation, and I’ve heard similar on some other 2020 samples over HF channels from the US test team. I’ll explore that at some stage.

Gerhard’s QO100 Station










Team Work

I am enjoying working with Gerhard and Steve on this project. We are roughly equi-distant around the globe but the time zone shift allows us to bounce new software versions around for testing on 12 hour cycles. Working as a team, we investigated the problem and greatly improved the performance of FreeDV 2020 over the QO-100 satellite. We worked very carefully, debugging tricky problems, collecting and comparing samples, and discussing our results via email.

We have applied new speech coding technology (the neural net/machine learning based LPCNet), modified and optimised a HF modem, and sent our signals through a new satellite transponder. This is real experimental radio!

Our next step is to look at improving modem acquisition, which is also likely to require tuning for this channel.

Reading Further

Es’hail 2/QO-100 satellite
WebSDR for QO-100 satellite
FreeDV 2020 First On Air Tests
Steve Ports an OFDM modem from Octave to C

September 15, 2019

Prometheus 2.12, query logging, and startup failures on macos

Share

Prometheus v2.12 added active query logging. The basic idea is that there is a mmaped JSON file that contains all of the queries currently running. If prometheus was to crash, that file would therefore be a list of the queries running at the time of the crash. Overall, not a bad idea.

Some friends had recently added prometheus to their development environments. This is wired up to grafana dashboards for their microservices, and prometheus is configured to store 14 days worth of time series data via a persistent volume from the developer desktops. We did this because it is valuable for the developers to be able to see the history of metrics before and after their changes.

Now we have a developer using macos as their primary development platform, and since prometheus 2.12 it hasn’t worked. Specifically this developer is using parallels to provide the docker virtual machine on his mac. You can summarise the startup for prometheus in the dev environment like this:

$ docker run ...stuff...
...snip...
level=error ts=2019-09-15T02:20:23.520Z caller=query_logger.go:94 component=activeQueryTracker msg="Failed to mmap" file=/prometheus-data/data/queries.active Attemptedsize=20001 err="invalid argument"
panic: Unable to create mmap-ed active query log

goroutine 1 [running]:
github.com/prometheus/prometheus/promql.NewActiveQueryTracker(0x7fff9917af38, 0x15, 0x14, 0x2a6b7c0, 0xc00003c7e0, 0x2a6b7c0)
	/app/promql/query_logger.go:112 +0x4d2
main.main()
	/app/cmd/prometheus/main.go:361 +0x52bd

And here’s the underlying problem — because of the way the persistent data is mapped into this container (via parallels sharing in this case), the mmap of the active queries file fails and prometheus fails to start.

In other words, since prometheus 2.12 your prometheus data files have to be stored on a filesystem which supports mmap. Additionally, there is no flag to just disable the active query logger.

So how do we work around this? Well, here’s a horrible workaround — in the data directory that is volume mapped into the container, create a symlink that is to a path that is mmapable inside the docker container, even if that path doesn’t exist outside the container. For example, given that we store the prometheus time series at $CONFIG/prometheus-data:

$ ln -s /tmp/queries.active "$CONFIG/prometheus-data/queries.active"

Note that /tmp/queries.active does not exist on the developer’s mac. Prometheus now starts and its puppies and kittens the whole way down.

Share

September 14, 2019

VDSL versus HF Radio

I’m putting up a 40M dipole. When I Tx on 40m (50W peak) my Internet drops out. Sometimes it comes back, other times the modem loses sync. The dipole has a balun, and is nicely tuned.

I tried some ferrites with several turns on the modem VDSL and power cables which improved the situation a little. But I still get a momentary drop out of Internet on PTT, and if I try hard enough I can still lose sync on the modem.

Now I have NBN (Australian National Broadband Network) with a VDSL link over traditional copper phone lines to a “node” several hundred metres away. Turns out VDSL uses bandwidth up to 30 MHz … so I guess I’m getting right into it’s pass band. Old school ADSL only used a few MHz. The phone line used for this service is 50 years old and has significant differential to common mode conversion. It’s not much of a transmission line. But probably a pretty good antenna!

I build a little jig with a transformer to couple the differential signal to my spectrum analyser and take a look:

Lotsa turns on the primary, one turn on the secondary, some core I found in the junk box. I adjusted the coupling capacitors in line with both arms of the primary so that the modem didn’t lose sync when I plugged it in (about 5pF). Also in this photo is the series LC circuit, but disconnected (open) at this stage.

Sure enough, I could see Rx energy from the node to my modem at around 7MHz, and other energy out to 12MHz. In the 7MHz region, I could see the Rx signal from the “node” at -60dBm. When I Tx SSB on 7.18 MHz my SSB signal was -30dBm. No wonder the modem is choking.

After some experimentation, I came up with a 7MHz LC series resonant circuit connected across the phone line. When the modem does it’s training thing, it sees a short circuit around 7MHz and ignores that region as no good. So when I transmit in that region, there is no modem signal to interfere with.

I started with a 800nH/600pF filter. Xc and Xl is a rather low 37 ohms reactance at resonance, and just a bit higher than that above resonance (e.g. at 8-12 MHz), attenuating a lot of the HF energy. So it was basically a LPF, killing anything above 6 MHz. This stopped the drop outs, but my Internet downstream bandwidth dropped from 55 to 24 Mbps.

Final Design

After some fiddling with a spreadsheet I came up with a 5uH/100pF series LC notch filter. This is simply a 5uH inductor in series with a 100pF capacitor, connected across the VDSL phone line.

This has a few hundred ohms of Xc above resonance, which results in just a few dB attenuation at 8-12 MHz. This obtained 38 Mbps downstream. Upstream was the same (24 Mbps) as with no filter. Good enough.

The inductor is 9 turns on a F37-61 core. Make sure you use a material suitable for high Q inductors. I initially used the wrong core material and couldn’t get a decent notch. I recommend you check the tuning – it should have a deep notch at around your operating frequency.

Here is a sweep of my filter:

I put 560 ohm resistors in series with the tracking generator output and spectrum analyser input to approximate the line impedance using this jig:

Here is a plot of the system in action:

The yellow plot is the original, unfiltered VDSL signal. At the same time I’m transmitting SSB. You can see my SSB signal on 7.18 MHz (yellow peak above the “1”).

Purple is with the series LC notch filter installed. You can see the notch left of the “1” at 7MHz. The “node” has worked out 7-8MHz is a dud band so isn’t sending any information. So nothing to interfere with when I PTT SSB. I’m not sending a SSB signal in this plot.

Note also the 8-12 MHz purple (filtered) is just a few dB lower than the yellow (unfiltered). So the notch filter isn’t wiping out the HF signals.

These plots show a mixture of Tx (-10dBm) from my modem, and Rx (underlying gentle downwards slope) – the signal from the “node”. I assume it’s full duplex, we just can’t see the Rx signal most of the time. I am sampling the combined signal next to the modem, so Tx dominates. You can see the Rx signal better when the modem is training.

For some reason my modem doesn’t Tx in the 6-8MHz band. Probably a good thing for RFI.

Results

Without the filter I get immediate interruptions pings and loss of modem sync after 20 seconds. With the filter I’ve hammered it for the last few weeks with SSB and FreeDV signals but no interruptions in pings or the received audio and waterfall from a local KiwiSDR.

There is a hit on my downstream bandwidth, but it’s not significant for me. Much nicer to be able to transmit on 40m and not have the Internet break!

Here is the finished filter, installed near the modem in some heat shrink:

I’d be interested to see if this idea will work at other sites. Due to the random nature of the phone lines no two VDSL installations are the same.

The design is simply a 5uH inductor in series with a 100pF capacitor, connected across the VDSL phone lines. You need a high Q inductor, I used 9 turns on a F37-61 core. If possible, carefully check the tuning of the notch filter.

I’ve also seen suggestions of using a quarter wave stub (about 10m of phone cable) to get the same effect. This is a neat idea, as you could just buy a 10m phone extension lead, and plug it in parallel with your VDSL line. However once again – carefully check the tuning of the stub – phone cable is messy, uncalibrated stuff!

This was an interesting little project, with a satisfying result. I quite like learning about RF, and (re) learnt about the trade-offs around reactance at resonance, transmission lines, and inductor core material.

Thanks for help and useful comments from AREG members on their mailing list. Several other AREG members are also suffering from the same problem, so I imagine it’s wide spread in other countries that use VDSL.

September 13, 2019

On the airwaves

As of this year, I’m now an amateur radio operator! Callsign VK2FAAS, foundation licence. It’s something I’ve always had an interest in doing. As a kid, I had some toy 27 MHz radios with barely 20 metres of range. Then, I got a job working as a sysadmin at a wireless ISP where we built long-distance wireless networks. And, while at LCA2013 I attended a ham radio BoF (“birds of a feather”) session, where some operators made a DX (long distance) contact by fashioning an antenna out of some wire tied to a tree.

September 11, 2019

Deploying and Configuring Vim on NixOS

NixOS Gears by Craige McWhirter

I had a need to deploy vim and my particular preferred configuration both system-wide and across multiple systems (via NixOps).

I started by creating a file named vim.nix that would be imported into either /etc/nixos/configuration.nix or an appropriate NixOps Nix file. This example is a stub that shows a number of common configuration items:

vim.nix:

with import <nixpkgs> {};

vim_configurable.customize {
  name = "vim";   # Specifies the vim binary name.
  # Below you can specify what usually goes into `~/.vimrc`
  vimrcConfig.customRC = ''
    " Preferred global default settings:
    set number                    " Enable line numbers by default
    set background=dark           " Set the default background to dark or light
    set smartindent               " Automatically insert extra level of indentation
    set tabstop=4                 " Default tabstop
    set shiftwidth=4              " Default indent spacing
    set expandtab                 " Expand [TABS] to spaces
    syntax enable                 " Enable syntax highlighting
    colorscheme solarized         " Set the default colour scheme
    set t_Co=256                  " use 265 colors in vim
    set spell spelllang=en_au     " Default spell checking language
    hi clear SpellBad             " Clear any unwanted default settings
    hi SpellBad cterm=underline   " Set the spell checking highlight style
    hi SpellBad ctermbg=NONE      " Set the spell checking highlight background
    match ErrorMsg '\s\+$'        "

    let g:airline_powerline_fonts = 1   " Use powerline fonts
    let g:airline_theme='solarized'     " Set the airline theme

    set laststatus=2   " Set up the status line so it's coloured and always on

    " Add more settings below
  '';
  # store your plugins in Vim packages
  vimrcConfig.packages.myVimPackage = with pkgs.vimPlugins; {
    start = [               # Plugins loaded on launch
      airline               # Lean & mean status/tabline for vim that's light as air
      solarized             # Solarized colours for Vim
      vim-airline-themes    # Collection of themes for airlin
      vim-nix               # Support for writing Nix expressions in vim
    ];
    # manually loadable by calling `:packadd $plugin-name`
    # opt = [ phpCompletion elm-vim ];
    # To automatically load a plugin when opening a filetype, add vimrc lines like:
    # autocmd FileType php :packadd phpCompletion
  };
}

I then needed to import this file into my system packages stanza:

  environment = {
    systemPackages = with pkgs; [
      someOtherPackages   # Normal package listing
      (
        import ./vim.nix
      )
    ];
  };

This will then install and configure Vim as you've defined it.

If you'd like to give this build a run in a non-production space, I've written vim_vm.nix with which you can build a VM, ssh into afterwards and test the Vim configuration:

$ nix-build '<nixpkgs/nixos>' -A vm --arg configuration ./vim_vm.nix
...
$ export QEMU_OPTS="-m 4192"
$ export QEMU_NET_OPTS="hostfwd=tcp::18080-:80,hostfwd=tcp::10022-:22"
$ ./result/bin/run-vim-vm-vm

Then, from a another terminal:

$ ssh nixos@localhost -p 10022

And you should be in a freshly baked NixOS VM with your Vim config ready to be used.

There's an always current example of my production Vim configuration in my mio-ops repo.

September 10, 2019

Deploying Gitea on NixOS

NixOS Gitea by Craige McWhirter

I've been using GitLab for years but recently opted to switch to Gitea, primarily because of timing and I was looking for something more lightweight, not because of any particular problems with GitLab.

To deploy Gitea via NixOps I chose to craft a Nix file (example) that would be included in a host definition. The linked and below definition provides a deployment of Gitea, using Postgres, Nginx, ACME certificates and ReStructured Text rendering with syntax highlighting.

version-management/gitea_for_NixOps.nix:

    { config, pkgs, lib, ... }:

    {

      services.gitea = {
        enable = true;                               # Enable Gitea
        appName = "MyDomain: Gitea Service";         # Give the site a name
        database = {
          type = "postgres";                         # Database type
          passwordFile = "/run/keys/gitea-dbpass";   # Where to find the password
        };
        domain = "source.mydomain.tld";              # Domain name
        rootUrl = "https://source.mydomaain.tld/";   # Root web URL
        httpPort = 3001;                             # Provided unique port
        extraConfig = let
          docutils =
            pkgs.python37.withPackages (ps: with ps; [
              docutils                               # Provides rendering of ReStructured Text files
              pygments                               # Provides syntax highlighting
          ]);
        in ''
          [mailer]
          ENABLED = true
          FROM = "gitea@mydomain.tld"
          [service]
          REGISTER_EMAIL_CONFIRM = true
          [markup.restructuredtext]
          ENABLED = true
          FILE_EXTENSIONS = .rst
          RENDER_COMMAND = ${docutils}/bin/rst2html.py
          IS_INPUT_FILE = false
        '';
      };

      services.postgresql = {
        enable = true;                # Ensure postgresql is enabled
        authentication = ''
          local gitea all ident map=gitea-users
        '';
        identMap =                    # Map the gitea user to postgresql
          ''
            gitea-users gitea gitea
          '';
      };

      services.nginx = {
        enable = true;                                          # Enable Nginx
        recommendedGzipSettings = true;
        recommendedOptimisation = true;
        recommendedProxySettings = true;
        recommendedTlsSettings = true;
        virtualHosts."source.MyDomain.tld" = {                  # Gitea hostname
          enableACME = true;                                    # Use ACME certs
          forceSSL = true;                                      # Force SSL
          locations."/".proxyPass = "http://localhost:3001/";   # Proxy Gitea
        };
      };

      security.acme.certs = {
          "source.mydomain".email = "anEmail@mydomain.tld";
      };

    }

This line from the above file should stand out:

              passwordFile = "/run/keys/gitea-dbpass";   # Where to find the password

Where does that file come from? It's pulled from a secrets.nix file (example) that for this example, could look like this:

secrets.nix:

    { config, pkgs, ... }:

    {
      deployment.keys = {
        # An example set of keys to be used for the Gitea service's DB authentication
        gitea-dbpass = {
          text        = "uNgiakei+x>i7shuiwaeth3z";   # Password, generated using pwgen -yB 24
          user        = "gitea";                      # User to own the key file
          group       = "wheel";                      # Group to own the key file
          permissions = "0640";                       # Key file permissions
        };
      };
    }

The file's path /run/keys/gitea-dbpass is determined by the elements. So deployment.keys determines the initial path of /run/keys and the next element gitea-dbpass is a descriptive name provided by the stanza's author to describe the key's use and also provide the final file name.

Now that we have described the Gitea service in gitea_for_NixOps.nix and the required credentials in secrets.nix we need to pull it all together for deployment. We achieve that in this case by importing both these files into our existing host definition:

myhost.nix:

    {
      myhost =
        { config, pkgs, lib, ... }:

        {

          imports =
            [
              ./secrets.nix                               # Import our secrets
              ./version-management/gitea_got_NixOps.nix   # Import Gitea
            ];

          deployment.targetHost = "192.168.132.123";   # Target's IP address

          networking.hostName = "myhost";              # Target's hostname.
        };
    }

To deploy Gitea to your NixOps managed host, you merely run the deploy command for your already configured host and deployment, which would look like this:

    $ nixops deploy -d MyDeployment --include myhost

You should now have a running Gitea server and be able to create an initial admin user.

In my nixos-examples repo I have a version-management directory with some example files and a README with information and instructions. You can use two of the files to generate a Gitea VM to take a quick poke around. There is also an example of how you can deploy Gitea in production using NixOps, as per this post.

If you wish to dig a little deeper, I have my production deployment over at mio-ops.

September 09, 2019

Monitoring OpenWrt with collectd, InfluxDB and Grafana

In my previous blog post I showed how to set up InfluxDB and Grafana (and Prometheus). This is how I configured my OpenWrt devices to provide monitoring and graphing of my network.

OpenWrt includes support for collectd (and even graphing inside Luci web interface) so we can leverage this and send our data across the network to the monitoring host.

OpenWrt stats in Grafana

Install and configure packages on OpenWrt

Log into your OpenWrt devices and install the required packages.

opkg update
opkg install luci-app-statistics collectd collectd-mod-cpu \
collectd-mod-interface collectd-mod-iwinfo \
collectd-mod-load collectd-mod-memory collectd-mod-network collectd-mod-uptime
/etc/init.d/luci_statistics enable
/etc/init.d/collectd enable

Next, log into your device’s OpenWrt web interface and you should see a new Statistics menu at the top. Hover over this and click on Setup so that we can configure collectd.

Add the Hostname field and enter in the device’s hostname (or some name you want).

Click on General plugins and make sure that Processor, System Load, Memory and Uptime are all enabled. Hit Save & Apply.

Under Network plugins, ensure Interfaces is enabled and select the interfaces you want to monitor (lan, wan, wifi, etc).

Still under Network plugins, also ensure Wireless is enabled but don’t select any interfaces (it will work it out). Hit Save & Apply (I don’t bother with the Ping plugin).

Click on Output plugins and ensure Network is enabled so that we can stream metrics to InfluxDB. All you need to do is add an entry under server interfaces that points to the IP address of your monitor server (which is running InfluxDB with the collectd listener enabled). Hit Save & Apply.

Finally, you can leave RRDTool plugin as it is, or disable it if you want to (it will stop showing graphs in Luci if you do, but we’re using Grafana anyway and you’ll have less load on your router). If you do enable, it make sure it is writing data to tmpfs to avoid wearing our your flash (this is the default configuration).

That’s your OpenWrt configuration done!

Loading a dashboard in Grafana

Still in your web browser, log into Grafana on your monitor node (port 3000 by default).

Import a new dashboard.

We will use an existing dashboard by contributor vooon341, so simply type in the number 3484 and hit Load.

This will download the dashboard from Grafana and prompt for settings. Enter whatever Name you like, select InfluxDB as your data source (configured in the previous blog post), then hit Import.

Grafana will now go and query InfluxDB and present your dashboard with all of your OpenWrt devices.

OpenWrt also supports a LUA Prometheus node exporter, so if you wanted to add those as well, you could. However, I think collectd does a reasonable job.

September 08, 2019

Setting up a monitoring host with Prometheus, InfluxDB and Grafana

Prometheus and InfluxDB are powerful time series database monitoring solutions, both of which are natively supported with graphing tool, Grafana.

Setting up these simple but powerful open source tools gives you a great base for monitoring and visualising your systems. We can use agents like node-exporter to publish metrics on remote hosts which Prometheus will scrape, and other tools like collectd which can send metrics to InfluxDB’s collectd listener (as per my post about OpenWRT).

Prometheus’ node exporter metrics in Grafana

I’m using CentOS 7 on a virtual machine, but this should be similar to other systems.

Install Prometheus

Prometheus is the trickiest to install, as there is no Yum repo available. You can either download the pre-compiled binary or run it in a container, I’ll do the latter.

Install Docker and pull the image (I’ll use Quay instead of Dockerhub).

sudo yum install docker
sudo systemctl start docker
sudo systemctl enable docker
sudo docker pull quay.io/prometheus/prometheus

Let’s create a directory for Prometheus configuration files which we will pass into the container.

sudo mkdir /etc/prometheus.d

Let’s create the core configuration file. This file will set the scraping interval (under global) for Prometheus to pull data from client endpoints and is also where we configure those endpoints (under scrape_configs). As we will enable node-exporter on the monitoring node itself later, let’s add it as a localhost target.

cat << EOF | sudo tee /etc/prometheus.d/prometheus.yml
global:
  scrape_interval:     15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
    - targets: ['localhost:9090']
  - job_name: 'node'
    static_configs:
    - targets:
      - localhost:9100
EOF

Now we can start a persistent container. We’ll pass in the config directory we created earlier but also a dedicated volume so that the database is persistent across updates. We use host networking so that Prometheus can talk to localhost to monitor itself (not required if you want to configure Prometheus to talk to the host’s external IP instead of localhost).

Pass in the path to any custom CA Certificate as a volume (example below) for any end points you require. If you want to run this behind a reverse proxy, then set web.external-url to the hostname and port (leave it off if you don’t).

Note that enabling the admin-api and lifecycle will be allow anyone on your network to perform those functions, so you may want to only allow that if your network is trusted. Else you should probably put those behind an SSL enabled, password protected webserver (out of scope for this post).

Note also that some volumes have either :z or :Z option appended to them, this is to set the SELinux context for the container (shared vs exclusive, respectively).

sudo docker run \
--detach \
--interactive \
--ttty \
--network host \
--name prometheus \
--restart always \
--publish 9090:9090 \
--volume prometheus:/prometheus \
--volume /etc/prometheus.d:/etc/prometheus.d:Z \
--volume /path/to/ca-bundle.crt:/etc/ssl/certs/ca-certificates.crt:z \
quay.io/prometheus/prometheus \
--config.file=/etc/prometheus.d/prometheus.yml \
--web.external-url=http://$(hostname -f):9090 \
--web.enable-lifecycle \
--web.enable-admin-api

Check that the container is running properly, it should say that it is ready to receive web requests in the log. You should also be able to browse to the endpoint on port 9090 (you can run queries here, but we’ll use Grafana).

sudo docker ps
sudo docker logs prometheus

Updating Prometheus config

Updating and reloading the config is easy, just edit /etc/prometheus.d/prometheus.yml and restart the container. This is useful when adding new nodes to scrape metrics from.

sudo docker restart prometheus

You can also send a message to Prometheus to reload (if you enabled this by web.enable-lifecycle option).

curl -s -XPOST localhost:9090/-/reload

In the container log (as above) you should see that it has reloaded the config.

Installing Prometheus node exporter

You’ll notice in the Prometheus configuration above we have a job called node and a target for localhost:9100. This is a simple way to start monitoring the monitor node itself! Installing the node exporter in a container is not recommended, so we’ll use the Copr repo and install with Yum.

sudo curl -Lo /etc/yum.repos.d/_copr_ibotty-prometheus-exporters.repo \
https://copr.fedorainfracloud.org/coprs/ibotty/prometheus-exporters/repo/epel-7/ibotty-prometheus-exporters-epel-7.repo

sudo yum install node_exporter
sudo systemctl start node_exporter
sudo systemctl enable node_exporter

It should be listening on port 9100 and Prometheus should start getting metrics from http://localhost:9100/metrics automatically (we’ll see them later with Grafana).

Install InfluxDB

Influxdata provides a yum repository so installation is easy!

cat << \EOF | sudo tee /etc/yum.repos.d/influxdb.repo
[influxdb]
name=InfluxDB
baseurl=https://repos.influxdata.com/centos/$releasever/$basearch/stable
enabled=1
gpgcheck=1
gpgkey=https://repos.influxdata.com/influxdb.key
EOF
sudo yum install influxdb

The defaults are fine, other than enabling collectd support so that other clients can send metrics to InfluxDB. I’ll show you how to use this in another blog post soon.

sudo sed-i 's/^\[\[collectd\]\]/#\[\[collectd\]\]/' /etc/influxdb/influxdb.conf
cat << EOF | sudo tee -a /etc/influxdb/influxdb.conf
[[collectd]]
  enabled = true
  bind-address = ":25826"
  database = "collectd"
  retention-policy = ""
   typesdb = "/usr/local/share/collectd"
   security-level = "none"
EOF

This should open a number of ports, including InfluxDB itself on TCP port 8086 and collectd receiver on UDP port 25826.

sudo ss -ltunp |egrep "8086|25826"

Create InfluxDB collectd database

Finally, we need to connect to InfluxDB and create the collectd database. Just run the influx command.

influx

And at the prompt, create the database and exit.

CREATE DATABASE collectd
exit

Install Grafana

Grafana has a Yum repository so it’s also pretty trivial to install.

cat << EOF | sudo tee /etc/yum.repos.d/grafana.repo
[grafana]
name=Grafana
baseurl=https://packages.grafana.com/oss/rpm
enabled=1
gpgcheck=1
gpgkey=https://packages.grafana.com/gpg.key
EOF
sudo yum install grafana

Grafana pretty much works out of the box and can be configured via the web interface, so simply start and enable it. The server listens on port 3000 and the default username is admin with password admin.

sudo systemctl start grafana
sudo systemctl enable grafana
sudo ss -ltnp |grep 3000

Now you’re ready to log into Grafana!

Configuring Grafana

Browse to the IP of your monitoring host on port 3000 and log into Grafana.

Now we can add our two data sources. First, Prometheus, poing to localhost on port 9090

..and then InfluxDB, pointing to localhost on port 8086 and to the collectd database.

Adding a Grafana dashboard

Make sure they tested OK and we’re well on our way. Next we just need to create some dashboards, so let’s get a dashboard to show node exporter and we’ll hopefully at least see the monitoring host itself.

Go to Dashboards and hit import.

Type the number 1860 in the dashboard field and hit load.

This should automatically download and load the dash, all you need to do is select your Prometheus data source from the Prometheus drop down and hit Import!

Next you should see the dashboard with metrics from your monitor node.

So there you go, you’re on your way to monitoring all the things! For anything that supports collectd, you can forward metrics to UDP port 25826 on your monitor node. More on that later…

September 05, 2019

Why Computers Lie Badly At Alarming Speed and the unum Promise

The translation of arithmetic to physical hardware with using the IEEE standard employed numerical representation is fraught with difficulty. As is well known by any who have used even a pocket calculator, computer processors are imprecise with dangerous rounding errors, which vary on different systems. Further, the standard representation method, IEEE 754 "Standard for Floating-Point Arithmetic" (1985, revised 2008), is extremely inefficient from an engineering perspective with increasing physical cost when additional precision is sought.

The basic issue is the limitations in converting decimal or floating point notation into binary form. The IEEE standard suggests that when a calculation overflows the value +inf should be used instead, and when a number is too small the standard says to use 0 instead. Inserting infinity to represent "a very big number" or 0 to represent a "very small number" will certainly cause computational issues. Floating point operations have additional issues when employed in parallel, breaking the logic of associative properties. The equation (a + b) + (c + d) in parallel will not equal the equation ((a + b) + c) + d when run in serial.

These issues have been known in computer science for some decades (Goldberg, 1991). In recent years an attempt has been made to reconstruct the physical implementation of arithmetic to physical hardware by providing a superset to IEEE's 754 standard and IEEE 1788, Standard for Interval
Arithmetic. This number format, the Unum (Gustafson, 2015), consists of a bit string of variable length with six sub-fields: a sign bit, exponent, fraction, uncertainty bit, exponent size, and fraction size. The uncertainty bit, or ubit, specifies whether or not there are additional bits after fraction, instead of rounding, in other words a precise interval. This means that numbers that are close to
zero or infinity are treated as such and are never represented as zero or infinity. To date, Unums have not been translated into hardware as they require more logic than floating-point numbers, but software logic has been provided.

Why Computers Lie Badly At Alarming Speed and the unum Promise
Challenges in High Performance Computing Conference
2-6 Sept, 2019 Mathematical Sciences Institute, Australian National University
http://levlafayette.com/files/2019ChallHPC-unums.pdf

September 04, 2019

Install newer git from software collections and enable globally

Work on Linux almost always means git for me, but the version provided by CentOS and RHEL is too old. Software collections is a convenient way to get a newer version and enable it for everyone by default.

First, enable software collections (different for RHEL and CentOS).

# CentOS
sudo yum install centos-release-scl
# RHEL
sudo yum-config-manager --enable rhel-server-rhscl-7-rpms

Install the newer version of git you want (e.g. git 2.18).

sudo yum install rh-git218

Enable it for everyone for any new sessions.

cat << EOF | sudo tee /etc/profile.d/git-scl.sh
source scl_source enable rh-git218
EOF

Test with a new session.

git --version
$SHELL
git --version