Ben Burwell

Avoid speculative error handling

2023-01-04T00:00:00Z

A general heuristic I’ve learned from reading, writing, and debugging a lot of code is don’t try to predict whether a future operation will fail. This comes up a lot when handling error conditions.

For example, I’ve seen plenty of code that reads something like this (pseudocode):

if not file_exists("input"):
  print("Input file not found")
else:
  input = open("input")

# or...

if not server_online("192.168.32.7"):
  print("Server is offline")
else:
  get_data("192.168.32.7")

It’s great to show friendly error messages like “Input file not found” or “Server is offline”, but the order of operations above is buggy!

The problem with the first case is that even if the file exists when it’s first checked, someone could delete or move it before the second statement executes. In the second case, a network cable could be unplugged.

It would be better to write the code as:

try input = open("input")
catch FileNotFound:
  print("Input file not found")

# and

try get_data("192.168.32.7")
catch ServerOffline:
  print("Server is offline")

Instead of checking whether something might not work before deciding whether to try at all, it’s usually better to just do the thing with the expectation that it might not work.

There are situations where it really does make sense to try to catch things early, but it seems much more common for this type of “speculative error handling” to cause problems than provide solutions.

How I Connect to Postgres Databases

2022-12-13T15:22:29Z

I often need to connect to PostgreSQL databases for projects I'm working on, and over time I've developed a method that works pretty well for me. It's pretty specific to how I like to work so I wouldn't recommend it for everyone. But since some of my coworkers have asked about it, I figured I'd write down the major pieces of the puzzle so others can adapt any parts they like to their own workflows.

For starters, I almost exclusively use the psql command line client. If you don't use psql, then most of this is probably not relevant to you. Otherwise, keep reading!

Throughout this page, I'll pretend that there's a database server that we want to connect to called db1.internal.net that listens on the default port of 5432.

Using `.pg_service.conf`

When you use psql to connect, you can use a connection URL, CLI flags, or you can use a series of libpq options:

$ psql 'host=db1.internal.net user=app dbname=db password=sesame port=5432'

If you're frequently connecting to the same database, it can be a little annoying to constantly type in all those parameters or try to find them in your shell history. To make life easier, you can put the libpq options for your frequently-used connections into a service file.

By default, libpq tries to load service definitions from ~/.pg_service.conf. This file uses an INI-style format, and can be populated with services like this:

[db1]   # <-- name of the service
host=db1.internal.net
port=5432
user=app
dbname=db
password=sesame

Once you create this file, you can connect with psql by simply referencing the service name:

$ psql 'service=db1'

Storing passwords in `.pgpass`

If you need to use password authentication for your database, and you don't want to keep your passwords in ~/.pg_service.conf, you can use a separate ~/.pgpass file.

Each line of this password file describes the password to use for a particular database connection or connections. The entry format is colon-delimited host:port:database:username:password. For example to connect to our db1 service, you could remove password=sesame from ~/.pg_service.conf and instead add the following line to ~/.pgpass:

db1.internal.net:5432:db:app:sesame

You can also use * as a wildcard for any of the fields, e.g. db1.internal.net:5432:*:app:sesame means to use password sesame to connect as the app user to any database on db1.internal.net:5432.

Port forwarding with SSH

Often, the databases you need to connect to aren't directly available, and you need to connect through a bastion host of some kind. For example, maybe we can only connect to db1.internal.net after we SSH into an internal network.

For the sake of example, we'll imagine that there is a server called ssh.public.net that we can SSH into when we want to connect to our db1 service.

We can forward a local port through a SSH tunnel by passing the -L option to ssh:

$ ssh -L 15432:db1.internal.net:5432 ben@ssh.public.net -p 2222

This will connect to ssh.public.net on port 2222, and then set up a socket on your local machine bound to port 15432, and any connections you make to that port will be forwarded over the SSH channel to db1.internal.net:5432 from the remote machine you're SSH'd into.

This means that we can now connect to db1.internal.net using psql by making a connection to localhost:15432. This can be wrapped up as an entry in your ~/.pg_service file where instead of listing db1.internal.net:5432, you list localhost:15432:

[db1]
host=localhost
port=15432
user=app
dbname=db
password=sesame

Using `~/.ssh/config`

Instead of needing to remember to use ben as the username for ssh.public.net, and that sshd is actually listening on port 2222, you can add an entry to ~/.ssh/config similar to the way the Postgres service file works:

Host ssh.public.net
  User ben
  Port 2222

Now, you can omit the username and port and simply:

$ ssh -L 15432:db1.internal.net:5432 ssh.public.net

You can actually make the Host label anything you want, it doesn't need to be the real name of the server. This can be useful if you don't actually have a DNS name to connect to and you don't want to remember the IP address:

Host my-internal-net
  User ben
  Port 2222
  HostName 192.0.32.7

Headless ssh with a control socket

So now we can connect fairly easily to our database:

Run ssh -L 15432:db1.internal.net:5432 ssh.public.net.
In a separate window, run psql service=db1.
When you are done with your psql session, use ^D to log out from the SSH connection.

This works pretty well, but for frequently used connections, it'd be even nicer to just have one command to run and not need to deal with multiple shell sessions.

Luckily, ssh connections can be controlled headlessly through a Unix control socket. Here's what this looks like:

$ ssh -M -S conn.sock -fnNT -L 15432:db1.internal.net:5432 ssh.public.net
$ psql service=db1
$ ssh -S conn.sock -O exit ssh.public.net

In the first command, we establish the SSH connection and specify conn.sock as the control socket for connection sharing. We also use the -f option so that ssh will go to background just before command execution. (You can read more about the other options in the ssh(1) manpage, but they basically prevent SSH from starting an actual console session on the remote host so we're only doing the port forwarding.)

Once the connection is established, we can run psql as usual, and forward our Postgres traffic over the established SSH connection.

Finally, when we're done with psql, we can have ssh send the exit control command over conn.sock to close the SSH connection.

Tying it all together

I tend to wrap all of this up in a short shell script named something like db1-psql. The scripts look pretty much like what I described above:

#!/bin/sh

SOCKET=db1-ssh.sock
LOCAL_PORT=15432
REMOTE_DB_HOST=db1.internal.net
REMOTE_DB_PORT=5432
SSH_HOST=ssh.public.net

ssh -M -S "$SOCKET" -fnNT -L "$LOCAL_PORT:$REMOTE_DB_HOST:$REMOTE_DB_PORT" "$SSH_HOST"
psql service=db1
ssh -S "$SOCKET" -O exit "$SSH_HOST"

With this in place, and the db1-psql script in my $PATH (usually for me this means dropping it in ~/.bin/), I can connect to the database by simply running:

$ db1-psql

There are lots of ways to connect to databases, but this is what I've found works well for me. Feel free to take any bits and pieces of this that you like and use them in workflow!

Transactions Are Not Locks

2022-03-29T00:00:00Z

One thing I wish I had understood better earlier on in my experience with PostgreSQL is how transactions and locks can be used together to provide serializable logic.

An easy way to illustrate this is with a simple bank account system. Suppose we create an accounts table and populate it like this:

create table accounts (
  name text primary key,
  balance int not null
);
insert into accounts (name, balance) values ('A', 10), ('B', 0);

Now we have two bank accounts, A with a balance of $10, and B with a balance of $0.

In order to be a useful bank, we want to be able to move money from one account to another. In pseudocode, the way to move money from one account to another might look something like:

function moveMoney(from, to, amount):
  # Start a transaction.
  txn = db.begin()
  # Update the balances.
  txn.execute('update accounts set balance = balance - $amount where name = $from')
  txn.execute('update accounts set balance = balance + $amount where name = $to')
  # Commit the transaction.
  txn.commit()

We use a transaction here to make sure that either both updates succeed, or both updates fail. In other words, we want to avoid the situation where money is deducted from A but never deposited to B.

There’s another situation that we might want to avoid in our bank too: we might want a rule that account balances can never be negative. To enforce this rule, we can update our moveMoney function:

function moveMoney(from, to, amount):
  # Moving a negative amount of money from A to B is equivalent to moving the
  # corresponding positive amount from B to A.
  if amount < 0:
    moveMoney(to, from, -1*amount)
    return

  # Start a transaction so that all of our queries/updates succeed or fail as a
  # unit.
  txn = db.begin()

  # Make sure the $from account has a balance of at least $amount.
  currBalance = txn.query('select balance from accounts where name = $from')
  if currBalance < amount:
    txn.rollback()
    throw exception

  # Move the money as before.
  txn.execute('update accounts set balance = balance - $amount where name = $from')
  txn.execute('update accounts set balance = balance + $amount where name = $to')

  # Commit the transaction.
  txn.commit()

But there’s a problem with this! Using a transaction only ensures that all of the writes succeed or fail together, it does not provide any guarantees that all of the statements in the transaction execute “at the same time” (i.e. the transactions are not serializable).

Preventing concurrency bugs

Let’s simulate two different actors calling moveMoney('A', 'B', 10) concurrently, again with A having an initial balance of $10 and B having $0:

Actor 1	Actor 2
`begin`
`select balance from accounts where name = 'A'`
	`begin`
	`select balance from accounts where name = 'A'`
`update accounts set balance = balance - 10 where name = 'A'`
`update accounts set balance = balance + 10 where name = 'B'`
`commit`
	`update accounts set balance = balance - 10 where name = 'A'`
	`update accounts set balance = balance + 10 where name = 'B'`
	`commit`

Now, if we check the account balances, we can see a problem:

postgres=# select * from accounts ;
 name | balance
------+---------
 A    |     -10
 B    |      20

Both actors read the initial balance as $10, and therefore allowed the operations to proceed. The transaction is ensuring that $10 is deducted from A if and only if $10 is deposited into B, but two transactions can still be reading and making decisions based on the same data concurrently.

(PostgreSQL by default does not allow two transactions to write the same data concurrently; after Actor 1 updates A’s balance, Actor 2 isn’t able to update A’s balance until after the first transaction is committed or rolled back.)

`check` constraints

There are a few ways we can fix this. One way would be to add a check constraint:

alter table accounts add constraint nonnegative_balance check (balance >= 0);

With this constraint, Actor 2’s update will fail because the constraint would be violated. In fact, we would no longer even need to check the previous balance in our application code at all, because the database itself would ensure no account’s balance ever goes below zero.

Table locks

Another approach would be to use a lock. Before we start reading or writing data from the accounts table, we can use a lock to ensure that our transaction has exclusive access to that table until we roll back or commit:

 begin;
+lock table accounts;
 select balance from accounts where name = 'A';
 update accounts set balance = balance - 10 where name = 'A';
 update accounts set balance = balance + 10 where name = 'B';
 commit;

The lock table accounts statement will not finish until no other transactions have any locks on the accounts table, and will prevent all other transactions from accessing the accounts table until our transaction is committed or rolled back.

Row locks

Locking the entire accounts table is an effective way to prevent overdrawing an account, but it also needlessly slows down our banking program. If someone is trying to move money from A to B while someone else is trying to move money from B to C, the second person’s transaction won’t be able to start until the first transaction completes, even though they’re touching different accounts.

Luckily, rather than acquiring a lock on the entire table, we can just acquire a lock on the row that we’re deducting money from. To do this, we can use for update at the end of our select statement:

select balance from accounts where name = 'A' for update;

Now, other transactions won’t be able to read this row until our transaction is committed or rolled back (for update can only be used inside a transaction).

Transaction isolation levels

One other way to ensure that we don’t overdraw an account is to change the isolation level of the transaction:

begin transaction isolation level serializable;
select balance from accounts where name = 'A';
update accounts set balance = balance - 10 where name = 'A';
update accounts set balance = balance + 10 where name = 'B';
commit;

The PostgreSQL manual has a good description of serializable:

If a pattern of reads and writes among concurrent serializable transactions would create a situation which could not have occurred for any serial (one-at-a-time) execution of those transactions, one of them will be rolled back with a serialization_failure error.

Where a row or table lock would prevent a second transaction from reading the balance until the previous transaction committed, with isolation level serializable the second transaction would immediately fail with an error message: “could not serialize access due to concurrent update.”

There’s a good explanation of the “serializable” consistency model—and how it differs from other models—on the Jepsen site.

Flame Graphs for Go With pprof

2022-03-11T00:00:00Z

This week, I was working on a Go program and I wanted to understand which part was taking the most time. I had seen some people use flame graphs for this, but had never made one myself, so I decided to try it out. It took a little time to figure out the right tools to use, but once I did it was pretty easy. Here’s what one looks like (full size):

On the X axis, a wider bar means more time spent, and on the Y axis you can see the call stack (functions lower down are calling functions higher up). There’s no particular meaning to the colors.

Unfortunately I can’t share the actual program I was working on, but I’ll show you the steps on a little program called whoami which starts a web server that responds to each request by simply writing back the IP address of the requester:

package main

import (
  "log"
  "net/http"
  _ "net/http/pprof"
)

func main() {
  http.HandleFunc("/", handle)
  log.Fatal(http.ListenAndServe(":8080", nil))
}

func handle(w http.ResponseWriter, r *http.Request) {
  log.Printf("handling request from: %s", r.RemoteAddr)
  if _, err := w.Write([]byte(r.RemoteAddr)); err != nil {
    log.Printf("could not write IP: %s", err)
  }
}

Go’s standard library includes some tools for profiling the running program through its various pprof packages and utilities. Here, I’m importing net/http/pprof, which exposes /debug/pprof endpoints on the DefaultServeMux.

Profiling our web server

I want to make sure whoami is actually serving requests when I profile it, so I’m using wrk to generate lots of requests by running wrk -d 30s 'http://localhost:8080'. With that running, I can fetch a 20 second CPU profile from the pprof server:

$ go tool pprof \
  -raw -output=cpu.txt \
  'http://localhost:8080/debug/pprof/profile?seconds=20'

This creates a file called cpu.txt containing the decoded pprof samples, which are what I need to build my flame graph.

Brendan Gregg, the inventor of flame graphs, has published some scripts to turn pprof output into an interactive flame graph. Since whoami is a Go program, I can use stackcollapse-go.pl to convert the samples to the right format for flamegraph.pl (both from this repository):

$ ./stackcollapse-go.pl cpu.txt | flamegraph.pl > flame.svg

Click here to see the result!

Make it faster!

One thing I noticed though is that about 30% of the time it takes to serve() each request seems to be spent in log.Printf, which needs to make a write system call to print the message to the terminal:

Maybe we can make our server faster by removing the logging? But to know if we can make it “faster,” we need to know how fast it is right now.

One interesting thing about flame graphs is that they don’t measure time in seconds, they measure it in samples. When you run pprof, it checks what your program is doing 100 times per second, so the flame graph is just an aggregation of “how many times was each unique stack trace sampled.” (This means that if you have a function that’s called rarely, it might not even appear in the flame graph at all!)

So to tell how fast whoami is in absolute terms, we can use wrk to gather some initial statistics. I want to run wrk again (rather than use the results from when I was profiling) because profiling your program will slow it down.

$ wrk -d 10s 'http://localhost:8080'
Running 10s test @ http://localhost:8080
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   152.41us   51.27us   2.81ms   86.24%
    Req/Sec    31.30k     2.27k   39.31k    72.77%
  628837 requests in 10.10s, 76.76MB read
Requests/sec:  62261.70
Transfer/sec:      7.60MB

When I remove the call to log.Printf and re-run the server, wrk now reports:

$ wrk -d 10s 'http://localhost:8080'
Running 10s test @ http://localhost:8080
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   117.23us   30.28us   1.02ms   78.11%
    Req/Sec    40.28k     3.09k   47.51k    69.80%
  809605 requests in 10.10s, 98.83MB read
Requests/sec:  80160.78
Transfer/sec:      9.79MB

Sure enough, it looks like the average latency decreased by around 30% (from 152 to 117 µs), and the requests per second correspondingly increased from 62k to 80k, around 30%!

I don’t have super high confidence in this measurement though. I’m doing this all on my laptop, so I’m not sure if the system calls that wrk is making to send the requests are slowing down the system calls whoami is making to read the requests and write the responses at all.

tl;dr: How to Make a Flame Graph from a `pprof` source

Download the scripts from Brendan Gregg’s FlameGraph repo and then assuming is either a pprof file or URL, run these commands:

$ go tool pprof -raw -output=cpu.txt 
$ stackcollapse-go.pl cpu.txt | flamegraph.pl > cpu.svg

You can also use pprof's web UI to do this without needing any external scripts:

$ go tool pprof -http=:

Then navigate to View > Flame Graph in the pprof UI that opens in your browser.

Contributing to the aerc email client

2022-03-05T00:00:00Z

Aerc is an open-source email client that runs in your terminal. During 2019 and early 2020, I contributed 32 patches to the project. Many of them were minor enhancements, bug fixes, and documentation updates, but I also contributed a number of more substantial features as well.

While I no longer use aerc regularly, I found it really rewarding to work with the other contributors and to interact with the community of people who were using software I helped write.

Maildir backend

When I first encountered aerc, it only supported browsing via IMAP. At the time, however, all of my email was synced to my machine and stored in a Maildir (essentially, a specific directory structure where each file represents an email message). Because I wanted to use aerc, and I didn’t want to lose my offline mail reading capabilities, I decided to add support for Maildir backends in addition to the existing IMAP.

While the maintainer had already planned to support multiple backends, including Maildir, only the IMAP backend had been implemented. This meant that there were a few places where the UI layer was tightly coupled with IMAP-specific functionality, so my first task was to extract slightly more generic models that could be used by various backends.

For example, many IMAP operations specify messages by their UID, which is a server-assigned unique identifier. The UI layer had initially been implemented in a way that used UIDs to distinguish between messages. However, in the context of a Maildir, messages don’t have server-assigned UIDs, but the UI layer still needs some way to keep track of which message(s) are being selected in the browser.

After a few revisions, my patch set for Maildir support was applied.

`:unsubscribe` command

Aerc is somewhat vim-like in its keyboard interface. Users type commands into a command line (or use keybindings to enter common commands more quickly), and then press the Return/Enter key to perform the action. Some of the common commands include things like :reply, :send, :attach, etc.

At this point, I think most people are probably familiar with “Unsubscribe” links that are included in messages from mailing lists. Astute users of certain mail readers like Gmail or Apple Mail may have even noticed that the mail application itself sometimes gives you a convenient “Unsubscribe” button, outside of the email display itself:

I had always assumed that these were being parsed out of the message body¹, but it turns out that RFC 2369 defines a List-Unsubscribe mail header that senders can add to their messages to provide a convenient way for people to unsubscribe.

The List-Unsubscribe header can contain one or multiple URLs. A mailto: URL can be used to specify an unsubscribe address, or a HTTP URL can specify a web page to visit in order to unsubscribe. I sent a patch to add an :unsubscribe command to aerc.

Address book integration

The third major feature I contributed was support for integrating an address book or contact list. Because many of aerc’s users tend to be familiar with or already use this sort of terminal-oriented utility, I took a lot of inspiration from the way mutt (another command line email client) handles contact integration.

I added an address-book-cmd configuration option that enabled users to configure an external command which aerc could run to fetch address completions. This command was expected to print out email addresses from the user's address book, and aerc would present the options in its tab completion system.

For my own personal use of this feature, I also created a janky little program that would print out completions for use in aerc by hitting a CardDAV endpoint.

And maybe in some cases, they actually are — I don’t have any visibility into how these email clients are implemented! ↩︎

Lutron Universal Wireshark

2022-02-27T00:00:00Z

One of my all-time favorite tools is Wireshark. During college, my summer internship at Lutron Electronics was focused on packaging a custom internal build of Wireshark, complete with new dissectors for Lutron’s proprietary network protocols.

Lutron’s lighting control hardware communicates using a variety of proprietary wired and wireless link protocols. My goal was to make it quicker and easier for R&D engineers and field technicians to debug and verify hardware by enabling them to capture and dissect these proprietary protocols using Wireshark.

There had been prior efforts to build Lutron protocol dissectors into Wireshark, but there had been a few challenges:

The customized builds of Wireshark would become outdated when new commands were added to the Lutron protocols, and when new version of Wireshark were released.
Different teams had built dissectors for their specific protocols, meaning that there wasn’t a single version of Wireshark which could capture and dissect any Lutron protocol.
Capturing was limited to Ethernet-based protocols, and was not available for serial data.

After meeting with stakeholders across the company to gain a better understanding of the problem, I went to work trying to resolve the issues that people were facing.

First, I created a Jenkins pipeline on an existing CI server so that when a new release of Wireshark was published, we could simply run the pipeline to compile and package a new installer, and publish it to an internal network drive.

Next, I looked at the dissector code other teams had written and worked to integrate them into the CI/CD pipeline. However, this didn’t completely solve the problem of new commands being added and not being reflected in Wireshark. To make this easier, I wrote a script that would parse specially-formatted comments out of a C header file and generate appropriate Wireshark dissector code.

Finally, to address the need to view serial data in Wireshark, I wrote a program to capture data from a USB serial interface and output it in pcap format, and wrote a small wrapper script in Lua to expose it as a Wireshark plugin.

I wrapped up the summer by presenting sessions about how to use Wireshark to teams across the company, including product development, QA, and field service.

While I spent a good chunk of time working on Wireshark, most of my work unfortunately could not be contributed upstream due to its proprietary nature. I was able to contribute a minor patch to resolve a quoting issues in a build script.

Intercepting Go TLS Connections with Wireshark

2021-05-14T00:00:00Z

I wrote previously about how I like to use mitmproxy for debugging HTTP services. This is a continued exploration of debugging network services, in particular focused around inspecting TLS encrypted traffic that your application is sending and receiving.

Transport Layer Security is a fundamental building block of modern secure communications on the Internet, and increasingly the software we write is expected to be a fluent speaker of TLS. While this brings security benefits for users, it also increases the complexity of understanding what our software is doing because when we try to use tools like Wireshark or tcpdump to inspect network traffic, all we see is encrypted data. Let’s see what a regular HTTP request looks like in Wireshark:

$ curl http://www.benburwell.com

Here, we can see the HTTP request and response. But what happens when we make the request over TLS?

$ curl https://www.benburwell.com

Here all we see are some TLS packets with embedded “encrypted application data.” We can see that a connection is being made, but we can”t inspect the raw HTTP request or response as we’d like to.

But all is not lost! There is a way for Wireshark to decrypt TLS connections and show you dissected application protocol packets, it just requires a little configuration. To understand how this works, we first need to understand a little bit about TLS.

How decrypting TLS in Wireshark works

TLS encrypts data within a session using a “master secret,” a symmetric encryption key that is established by using a key exchange protocol. So in order for Wireshark to be able to decrypt and dissect TLS packets, we need some way to tell it the master secret for the session.

The master secret is agreed upon using a cryptographic protocol when the TLS connection is established. The exact implementation varies, but in general the client and the server use some clever math to derive a value that is known at both ends and yet is never directly sent over the wire, such that it is computationally expensive for intermediate observers to derive the secret for themselves. We won’t get into the specifics, but one important detail for later is that this exchange involves the client sending the server a large random number in plain text, before the encrypted stream begins.

Conveniently, many TLS client libraries support the use of a key log file, which does pretty much exactly what it sounds like: when the SSLKEYLOGFILE environment variable is set, the library writes the key needed to decrypt the traffic each time it establishes a TLS connection. Originally, this was implemented in Mozilla’s (at the time Netscape’s) Network Security Services library, so you might also see it referred to as a “NSS Key Log File.” Let’s give this a try!

$ SSLKEYLOGFILE=/tmp/keys curl -s https://www.benburwell.com >/dev/null
$ cat /tmp/keys
CLIENT_RANDOM 40b1a54e6b38f7accb90e1f5162534b8628389f4257e39f614a3ca28514db2c7 3121d2812c459996b072165c2ece4a1c85687d7073de06be0e1c16bf4a862fbe26a8cba24db1a4a0a9684fb19ad52f97

(Note that SSLKEYLOGFILE support was only enabled by default in curl 7.58, so if this isn’t working for you, check which version of curl you have).

This line in the key log means that for the TLS connection that was initiated with the CLIENT_RANDOM of 40b1..., the master secret is 3121.... So now we just need to tell Wireshark about this. Let’s start a new capture and make another request:

Now, we can right-click on the “Transport Layer Security” layer and select Protocol Preferences -> (Pre)-Master-Secret log filename... and enter the path to our SSLKEYLOGFILE, /tmp/keys, and something magical happens:

Now, when Wireshark encounters a TLS handshake, it can extract the random value sent by the client and consult the key log file to discover a matching CLIENT_RANDOM line and use the corresponding session master secret to decrypt the data sent over the connection. So in addition to seeing the TLS details as before, we can also see the decrypted HTTP requests!

Configuring Go to use a TLS Key File

Go doesn’t support the SSLKEYLOGFILE environment variable directly, but it does have a different mechanism to achieve the same result. The crypto/tls.Config struct has a KeyLogWriter field:

// KeyLogWriter optionally specifies a destination for TLS master secrets
// in NSS key log format that can be used to allow external programs
// such as Wireshark to decrypt TLS connections.
KeyLogWriter io.Writer

In typical Go fashion, I/O has been abstracted to an io.Writer interface. Since we can use an *os.File to satisfy this interface, all we need to do to produce a file containing the TLS secrets is to open a file and pass that through the tls.Config.KeyLogWriter:

package main

import (
  "crypto/tls"
  "net/http"
  "os"
)

func main() {
  f, err := os.OpenFile("/tmp/keys", os.O_APPEND|os.O_CREATE|os.O_WRONLY, 0600)
  if err != nil {
    panic(err)
  }
  defer f.Close()

  client := &http.Client{
    Transport: &http.Transport{
      TLSClientConfig: &tls.Config{
        KeyLogWriter: f,
      },
    },
  }

  client.Get("https://www.benburwell.com")
}

Building and running our program results in CLIENT_RANDOM lines being appended to our /tmp/keys file and picked up by Wireshark, which in turn is able to decrypt the messages being sent by our program:

In practice, for decrypting HTTP traffic for debugging, I find mitmproxy to be faster and easier, since it doesn’t require changes to the program. However, sometimes it’s preferable to look at the actual bytes on the wire, which is where using a key log file with Wireshark might be a better approach.

Additionally, there are plenty of protocols other than HTTP that use TLS connections, and where proxying isn’t an option. For example, I’ve used a key log file with Wireshark to debug a Go program that was making an IMAP connection to a mail server. Because of the way Go’s libraries tend to be layered, the code to do this was very similar to the HTTP example above; I just needed to use my custom tls.Config when constructing an IMAP client instead of a HTTP client.

Debugging HTTP services with mitmproxy

2021-05-06T00:00:00Z

I spend a lot of my time at work writing Go services that talk to other Go services over HTTP. Much of the time, everything works as expected, but every now and then a situation arises where I’m struggling to understand why my program is receiving a specific value. Is my request not being built correctly? Am I not properly deserializing the response? Logging can be helpful, but sometimes I really just want to look at the HTTP traffic between services.

One tool that I really love in these situations is mitmproxy: “a free and open source interactive HTTPS proxy,” according to its website.

There are no shortage of features and options for mitmproxy, and when I was first exploring it they were a little daunting. I’m sure there are still tons of things that it can do that I don’t know about, but the main thing I tend to use it for is reverse proxying.

Reverse proxying is a pretty simple concept; basically it means that you send your HTTP requests to a specific proxy endpoint, and the proxy repeats your request to some specific origin server. Then the same thing just happens backwards: the origin server sends its reply to the reverse proxy which passes it along back to you. This means that the proxy can see (and log!) the actual HTTP request you send, and the response sent by the origin server.

Earlier today, I was working on a program that sent requests to a HTTP server and my program’s output didn’t make sense. I wasn’t sure if my requests were being sent incorrectly, or maybe there was a bug in the server I was talking to. So I fired up mitmproxy to take a look. In my shell, I ran:

mitmproxy --mode reverse:https://service.dev.example.com --listen-port 8080

This opens up a log window where any requests handled by the reverse proxy will be displayed. I quickly updated my program to make its requests to http://localhost:8080 instead of https://service.dev.example.com and re-ran it. The request and response were logged in the terminal window, and I was quickly able to identify that a particular dependency needed to be updated.

Of course, you could also use Wireshark or tcpdump to inspect network traffic, and these are great options that I also use frequently! But the main reason I tend to turn first to mitmproxy is because it handles TLS like a honey badger -- it just doesn’t give a shit. Basically, you can throw whatever you want at it and it’ll just do the right thing:

Client  --TLS-->   mitmproxy  --TLS-->   Origin
Client  --HTTP-->  mitmproxy  --TLS-->   Origin
Client  --TLS-->   mitmproxy  --HTTP-->  Origin
Client  --HTTP-->  mitmproxy  --HTTP-->  Origin

How does this work? When you first run mitmproxy, it generates a certificate authority that it uses to generate certificates on-the-fly. All you need to do is add the CA certificate to your OS trust store (see their docs about certificates here). For example, if I run mitmproxy --mode reverse:https://www.benburwell.com --listen-port 8080, and then connect over SSL, I can see the certificate that mitmproxy generated:

$ openssl s_client -connect localhost:8080
CONNECTED(00000005)
depth=1 CN = mitmproxy, O = mitmproxy
verify error:num=19:self signed certificate in certificate chain
verify return:0
---
Certificate chain
 0 s:/CN=www.benburwell.com
   i:/CN=mitmproxy/O=mitmproxy
 1 s:/CN=mitmproxy/O=mitmproxy
   i:/CN=mitmproxy/O=mitmproxy

Here, mitmproxy CA has generated a certificate with CN=www.benburwell.com to match the hostname I’m reverse proxying to!

Now, there are ways to snoop on TLS encrypted traffic with Wireshark as well using a TLS key log file, but this usually involves making somewhat non-trivial modifications to the program you’re working with. It’s not very complicated or difficult, and it’s a technique I’ve used a few times, but mitmproxy is usually quicker and easier for me. I plan to write a post about this topic in the future, so stay tuned! (Update: see my post about decrypting TLS in Wireshark)

Learning About Syscall Filtering With Seccomp

2020-06-27T00:00:00Z

I’d heard about being able to run Docker containers with a custom security profile, but wasn’t really sure what that meant or what was happening behind the scenes, so I decided to do some experimentation to find out.

It turns out that the Linux kernel includes a feature called “secure computing mode,” or seccomp for short. Using seccomp lets you tell the kernel that you only expect your program to use a specific set of system calls, and if your program makes any system calls that aren’t in your approved list, the kernel should kill your program.

But why would you want to do this? I think if you had a pretty simple program, using seccomp might be overkill. But if your program makes different system calls depending on possibly-untrustworthy user input, it might make sense to try to limit what the program is allowed to do. Looking at a list of software using seccomp on Wikipedia backs this up: the software listed are mostly hypervisors/container runners (like Docker), web browsers, etc.

By reading the manual page for the seccomp(2) system call, we can learn how to write a program to try this out. The simplest action is to enter “strict mode,” which prevents all system calls except for read(2), write(2), _exit(2), and sigreturn(2) --- in other words, what I think should be just enough to write hello world! Let’s give it a shot:

#include 
#include 
#include 

int
main()
{
        if (prctl(PR_SET_SECCOMP, SECCOMP_MODE_STRICT) != 0) {
                perror("prctl");
                return 1;
        }
        printf("hello, world!\n");
        return 0;
}

When I compile and run my program, I just see Killed being printed, not hello, world!. Well, this is pretty good evidence that seccomp is doing something --- it’s at least killing my program! Let’s try to find out why it’s being killed using strace, a program that shows you all of the system calls being made:

$ strace ./hello
execve("./hello", ["./hello"], 0x7fff77b754b0 /* 20 vars */) = 0
brk(NULL)                               = 0x559e08463000
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=25762, ...}) = 0
mmap(NULL, 25762, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fe65b9f0000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260\34\2\0\0\0\0\0"...,
832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=2030544, ...}) = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
0x7fe65b9ee000
mmap(NULL, 4131552, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) =
0x7fe65b3df000
mprotect(0x7fe65b5c6000, 2097152, PROT_NONE) = 0
mmap(0x7fe65b7c6000, 24576, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1e7000) = 0x7fe65b7c6000
mmap(0x7fe65b7cc000, 15072, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7fe65b7cc000
close(3)                                = 0
arch_prctl(ARCH_SET_FS, 0x7fe65b9ef4c0) = 0
mprotect(0x7fe65b7c6000, 16384, PROT_READ) = 0
mprotect(0x559e077b9000, 4096, PROT_READ) = 0
mprotect(0x7fe65b9f7000, 4096, PROT_READ) = 0
munmap(0x7fe65b9f0000, 25762)           = 0
prctl(PR_SET_SECCOMP, SECCOMP_MODE_STRICT) = 0
fstat(1,  )             = ?
+++ killed by SIGKILL +++
Killed

There’s a lot at the beginning about loading dynamically linked libraries, reading the program binary, and mapping it into memory that I don’t fully understand. But the last few syscalls provide some clues: right after prctl is called, we see fstat being called! fstat is a system call for getting the status of a file, and 1 happens to be the file descriptor for standard output. It makes sense that calling printf might involve checking the status of standard output, so I tried commenting out the call to printf in hello.c. When I compiled and ran the new version, it still just printed Killed, so I used strace again. Just looking at the last few lines:

prctl(PR_SET_SECCOMP, SECCOMP_MODE_STRICT) = 0
exit_group(0)                           = ?
+++ killed by SIGKILL +++
Killed

Now my program is making the exit_group system call. Thinking back to the manual page for seccomp, it said:

The only system calls that the calling thread is permitted to make are read(2), write(2), _exit(2) (but not exit_group(2)), and sigreturn(2).

It looks like I’ll need to actually do some real filtering if I want to run my hello world program and not just use strict mode. To do this, we need to use SECCOMP_MODE_FILTER and pass a pointer to a struct sock_fprog, which according to the manpage is “a Berkeley Packet Filter program designed to filter arbitrary system calls and system call arguments.“

While we could construct a BPF program using an array of struct sock_filters, looking at the chain of instructions we’d need made me think it would be much easier to enlist the services of libseccomp, a library designed for just this purpose. Let’s try rewriting hello.c to use libseccomp and allowing those three syscalls we saw before (fstat, write, and exit_group):

#include 
#include 
#include 

scmp_filter_ctx ctx;

/* graceful_exit cleans up our seccomp context before exiting */
void
graceful_exit(int rc)
{
        seccomp_release(ctx);
        exit(rc);
}

/* setup_seccomp initializes seccomp and loads our BPF program that filters
 * syscalls into the kernel */
void
setup_seccomp()
{
        int rc;

        /* Initialize the seccomp filter state */
        if ((ctx = seccomp_init(SCMP_ACT_KILL)) == NULL) {
                graceful_exit(1);
        }
        if ((rc = seccomp_reset(ctx, SCMP_ACT_KILL)) != 0) {
                graceful_exit(1);
        }

        /* Add allowed system calls to the BPF program */
        if ((rc = seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(fstat), 0)) != 0) {
                graceful_exit(1);
        }
        if ((rc = seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(write), 0)) != 0) {
                graceful_exit(1);
        }
        if ((rc = seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(exit_group), 0)) != 0) {
                graceful_exit(1);
        }

        /* Load the BPF program for the current context into the kernel */
        if ((rc = seccomp_load(ctx)) != 0) {
                graceful_exit(1);
        }
}

int
main()
{
        setup_seccomp();
        printf("hello, world!\n");
        graceful_exit(0);
}

Since we’re now using libseccomp, we need to tell our C compiler to link the library:

$ cc -o hello hello.c -lseccomp
$ ./hello
hello, world!

Success! Our program compiles and runs, and all of the necessary syscalls have been allowed. Now let’s try modifying the main() function of our program to do something bad, like trying to read the password file /etc/shadow:

int
main()
{
        FILE *fd;
        setup_seccomp();
        printf("hello, world!\n");
        if ((fd = fopen("/etc/shadow", "r")) == NULL) {
                perror("fopen");
                graceful_exit(1);
        }
        fclose(fd);
        graceful_exit(0);
}

Now when we compile and run our program, we get:

$ ./hello
hello, world!
Bad system call (core dumped)

Nice! The kernel killed our program when we tried to use a system call (openat) that we didn’t plan on!

I wanted to figure out how to allow openat to only open a specific file name, but I couldn’t figure out how to compare string system call arguments. Thanks to Isaiah Bell for referring me to the explanation for why this isn’t possible: to prevent time-of-check-time-of-use problems.

Now let’s go back to how this all fits in to Docker. Looking at Docker’s default seccomp profile, a lot of it starts to make more sense. In fact, it looks like they’re using the exact same names from libseccomp that we used in our program! If we search the moby source code for libseccomp, we can see that it is indeed being used (via Go bindings).

Let’s try to use a custom seccomp profile to prohibit programs in our Docker container from listening for network connections. To start, I want to make sure I can accept network connections, then modify my profile and watch it break. I downloaded the default seccomp profile to use as a starting point for tweaking, started a container with port 4000 open, then used nc to try communicating from my host machine to a listener in the Docker container:

$ docker run --rm -it -p 4000:4000 --security-opt seccomp=seccomp.json alpine
/ # nc -l -p 4000

When I run echo hi | nc 127.0.0.1 4000 in a separate terminal, my greeting is printed by the netcat listener in the Docker container---success! Now that I know my basic TCP server works, let’s try blocking it with seccomp! To start listening on a TCP port, I know that nc has to use the socket, bind, and listen system calls (which we can verify using strace). I’ll try removing them from the list of allowed system calls in the default profile, and run the docker container again with the modified profile:

$ docker run --rm -it -p 4000:4000 --security-opt seccomp=seccomp.json alpine
/ # nc -l -p 4000
nc: socket(AF_INET,1,0): Operation not permitted

Awesome! We just used seccomp to control what our Docker container is allowed to do!

I can imagine this might be helpful if you had an environment where security was extremely important and wanted to really lock down your containers, but it’s hard to imagine that writing custom seccomp profiles for every container in your production environment is the best use of time without having some specific situation you’re trying to address.

How to Add Row Level Security to Views in PostgreSQL

2020-04-02T00:00:00Z

Recently, I needed to store some customer-specific data in a PostgreSQL database and grant customers access to only their data in the shared tables. Fortunately, PostgreSQL has support for row level security in conjunction with its RBAC model which helps us do exactly that.

While row level security does exactly what we need it to for tables, I ran into a challenge when I needed to apply the same row level security to views built from the tables: row level security is only available on tables, not on views! Luckily, I was able to find a way to accomplish what I needed to and learned some more about Postgres along the way.

How to follow along in a Docker “lab” with our schema and dummy data:

# Run the docker container:
$ docker run --rm --detach --name rlslab benburwell/postgres-rls-lab

# Connect to the database in the container using psql:
$ docker exec -it rlslab psql -U postgres

# Remember to stop the container when you’re done!
$ docker stop rlslab

Back to the good stuff:

Let’s start off by creating some tables that we’ll store customer-specific data in. To grant our customers access to only their data in these tables, we’ll be creating a role for each customer, e.g. customer_a, customer_b, and so on, and we’ll include a customer_user column on each table that specifies the role which should have access to that row:

CREATE TABLE milestones (
  id serial primary key,
  customer_user varchar,
  name varchar
);

CREATE TABLE milestone_events (
  milestone_id int,
  customer_user varchar,
  name varchar
);

Now, we’ll create the customer users. To simplify management, we can create a generic customer role that has the access we want each customer to have, and then just grant that role to new customers as we onboard them.

CREATE ROLE customer;
GRANT SELECT ON milestones TO customer;
GRANT SELECT ON milestone_events TO customer;

Next, we’ll create our individual customer roles and grant them the privileges from the generic customer role we just created:

CREATE ROLE customer_a;
CREATE ROLE customer_b;
GRANT customer TO customer_a, customer_b;

Next, let’s populate our milestones and milestone_events tables with some dummy data:

postgres=# SELECT * FROM milestones;
 id | customer_user |          name
----+---------------+---------------------------
  1 | customer_a    | A great milestone
  2 | customer_a    | Another milestone
  3 | customer_b    | Customer B milestone
  4 | customer_c    | Spooky invisible milestone

postgres=# SELECT * FROM milestone_events;
 milestone_id | customer_user |      name
--------------+---------------+----------------
            1 | customer_a    | First task
            1 | customer_a    | Second task
            2 | customer_a    | Another task
            3 | customer_b    | B event
            4 | customer_c    | Invisible task

Now, we’ll add the row-level security policies to these tables so that customer users only have access to the appropriate rows in these tables:

postgres=# ALTER TABLE milestones ENABLE ROW LEVEL SECURITY;
ALTER TABLE
postgres=# CREATE POLICY customer_access ON milestones
postgres-# FOR SELECT
postgres-# USING (customer_user = current_user);
CREATE POLICY

Let’s switch over to the customer_a role and check out the results:

postgres=# set role customer_a;
postgres=> select * from milestones;
 id | customer_user |       name
----+---------------+-------------------
  1 | customer_a    | A great milestone
  2 | customer_a    | Another milestone

Nice! Because of our row-level security policy on the milestones table, we only see the rows where customer_user matches our current user, customer_a.

It would be really nice to create a view for these tables so that we can see all the events with their related milestone names. Let’s jump back to the postgres role and create the view:

postgres=# CREATE VIEW milestone_events_view AS
postgres-# SELECT milestone_id, m.name as milestone_name, e.name as event_name
postgres-# FROM milestone_events e
postgres-# JOIN milestones m ON e.milestone_id = m.id;
CREATE VIEW
postgres=# GRANT SELECT ON milestone_events_view TO customer;
GRANT

Let’s switch back over to our customer_a role and take a look:

postgres=> SELECT * FROM milestone_events_view;
 milestone_id |       milestone_name       |   event_name
--------------+----------------------------+----------------
            1 | A great milestone          | First task
            1 | A great milestone          | Second task
            2 | Another milestone          | Another task
            3 | Customer B milestone       | B event
            4 | Spooky invisible milestone | Invisible task

Whoa! We shouldn’t be able to see all these other customers’ data! That was the whole point of the row level security policy we set up! As it turns out, PostgreSQL views always adhere to the permissions of their owner (in this case the postgres superuser) rather than the current user.

How can we fix this? Changing the owner of the view wouldn’t help us because then all the customer users would just see customer_a’s data.

My solution was to create a function that does the selection. In Postgres, functions can either be run with the privileges of the user who created them (by specifying SECURITY DEFINER), or as the user calling them (with SECURITY INVOKER).

CREATE FUNCTION customer_milestone_events()
RETURNS TABLE (
  milestone_id int,
  milestone_name varchar,
  event_name varchar
)
LANGUAGE sql
SECURITY INVOKER
AS $$
  SELECT milestone_id, m.name AS milestone_name, e.name AS event_name
  FROM milestone_events e
  JOIN milestones m ON e.milestone_id = m.id
$$;

In order to make the results conveniently available as a view, we can create a view based on this function:

CREATE VIEW pub_milestone_events AS SELECT * FROM customer_milestone_events();
GRANT SELECT ON pub_milestone_events TO customer;

Now, when we switch over to our customer_a role and query our new view, we only see the rows we’re supposed to see:

postgres=> select * from pub_milestone_events ;
 milestone_id |  milestone_name   |  event_name
--------------+-------------------+--------------
            1 | A great milestone | First task
            1 | A great milestone | Second task
            2 | Another milestone | Another task

And as customer_b:

postgres=> select * from pub_milestone_events ;
 milestone_id |    milestone_name    | event_name
--------------+----------------------+------------
            3 | Customer B milestone | B event

Tada! Row level security on views in PostgreSQL.

MIG welding

2020-02-13T00:00:00Z

Recently, I took a MIG welding class at Artisan’s Asylum in Somerville, MA. I wanted to document what I learned so that I can refer back to it in the future, so here it is!

Safety

There are four primary hazards:

Burns. You’re dealing with liquid metal, so the work piece will remain hot even after you finish a weld. Also, small balls of molten steel will fly away from the work area and can burn through clothing and footwear, or ignite flammable objects nearby. Precautions: welding jacket, welding gloves, face mask, eye protection. Scan the surrounding area for fire hazards before welding.
Electric shock. MIG welding is an electrical process in which current is passed between the welding wire and the ground clamp. Precautions: don’t weld in damp areas.
Radiation. Ultraviolet light emitted from the arc can damage eyesight. Precautions: welding face shield, preferably an auto-darkening one, though a small amount of exposure will occur due to the delay between the arc flash and the sensor activation, mostly a concern when you spend 40 hours a week welding.
Asphyxiation. MIG welding typically uses a gas mix called C25, a mix of 75% argon and 25% carbon dioxide, in order to displace oxygen from the work area which would oxidize the molten steel and prevent a solid weld from forming. As the gas mix displaces oxygen, welding in an enclosed area could result in hypoxia. Precautions: weld in a well-ventilated area. The welding shop at A^2 has an exhaust system.

Mechanics

There are two primary controls on the welding machine: voltage (a proxy for temperature) and feed speed. There’s a chart on the welding machine with recommended settings for types and thicknesses of sheet metal. Some machines have an “auto feed” setting which tends to work fairly well.

Don’t weld anything galvanized, as the zinc will vaporize and can cross the blood-brain barrier causing neurological damage.

There are three common gauges of welding wire: 0.023", 0.030", and 0.035". 030 is a good general purpose wire. When you switch out a reel, be sure that the feed rollers are fitted for the correct gauge. The number facing out on the reel is the groove in use, regardless of whether it’s actually on the same side of the number or the opposite side.

To reduce friction on the sleeve that holds the welding wire inside the gas cable, try to avoid sharp loops, similar to how a garden hose can kink.

You want to keep the nozzle about 3/8" away from the work piece.

After a weld, slag and ash may accumulate on the surface. This can be brushed off with a wire brush or ground down for a cleaner surface.

The welding tip is where the current is transferred from the electrode running down the cable into the welding wire. It’s relatively easy for the tip to become damaged and require replacement; it’s a 15-cent consumable part so not a big deal. To replace, pull off (don’t unscrew) the sheath from the end of the nozzle, unscrew the tip, and screw a new one in. The inner diameter of the sheath may also accumulate slag buildup over time which can easily be cleaned out using the tapered end of MIG welding pliers.

To warn people nearby, announce “welding” before starting a weld.

Two directions for welding: push and drag. Not much difference between them other than what you can see: drag allows you to see the bead you’re laying while push allows you to see where you’re going.

Tack Welds

Useful for tacking a piece in place before laying a bead. With about 3/8" of wire protruding from the tip, place the tip against the work surface, and depress the trigger for about 1 second.

Beads

A bead can be produced by dragging the end of the welding wire along the work piece. The bead should be about twice as wide as the work piece is thick; e.g. for 1/8" steel the bead should be about 1/4" wide. The width can be controlled by regulating the speed at which you drag the nozzle across the work surface.

A nicer bead can be laid by using a circular motion, which also produces the aesthetically pleasing waves.

The height of the bead should be fairly low, as the larger the angle between the sheet metal and the bead, the less sturdy the weld will be.

Fillets

This is basically just a bead laid to join two pieces at a right angle. Use the same basic technique, but you’ll be bumping then nozzle up against the side and bottom pieces in order to be close enough to the work area right at the join.

“Series of tacks”

Can be useful when welding very thin stock that welding a bead might melt right through. This works by using the thermal capacity of the previously welded tack to help absorb some of the heat. Place the wire right up against the previous tack for optimal heat dissipation.

Fill-in

You can fix a hole by placing a bunch of dots inside it. It won’t look super pretty, you’ll almost certainly need to grind it down afterwards, and it won’t be very strong. You could use it as an aesthetic (rather than structural) fix e.g. if it’ll be painted over afterwards.

How the Dewey Decimal Classification Works

2020-02-11T00:00:00Z

The Dewey Decimal Classification (DDC) is widely used in libraries to organize their collections. I think a lot of people have probably used the DDC to find a book in a library, and a lot of people generally know how it works: number ranges correspond to high-level topics, with more numbers in the middle to fill in more specific subjects. You might be familiar with the table of main classes:

000	Computers and general information
100	Philosophy and Psychology
200	Religion
300	Social Sciences
400	Language
500	Math and Science
600	Technology
700	Art
800	Literature
900	History and Geography

I’ve always been interested in how the rest of the digits were decided on, so I decided to learn more! Surprisingly, it’s a bit challenging to find references on the DDC because it’s actually sort of a proprietary system. It’s managed and published by the Online Computer Library Center (OCLC), and they’re quite happy to sell you the DDC or access to WebDewey for many hundreds of dollars.

After some further digging, I came across an online class on the Dewey Decimal Classification from the Nebraska Library Commission. It’s three sessions of about an hour each. And now I know a lot more about how the DDC works!

The DDC was created in the 1870s by Melvil Dewey, who was a problematic person, and as a result the DDC has its share of issues. For these reasons and others, many libraries are moving away from the DDC to other systems such as the Library of Congress classification system or the BISAC subject codes used by many booksellers. Though it might be in decline, it’s widely-enough used that I still wanted to learn more about it.

The DDC organizes works into one of the ten main classes shown above. Each class has ten divisions (the second digit), and each division has ten sections (the third digit). There are further subdivisions that can be applied for more specific works. Overall, this forms a tree structure in which each subsequent digit traverses down the tree to a more specific topic. Works are classified into the node which is as specific as possible, so in general a shorter number or a number with fewer non-zero terminal digits will refer to a work that covers a broader range of topics.

In order to properly class works, there are two primary variables to consider: the subject/topic and the discipline. For example, you might class a work on dogs either in 599.77 (Natural sciences and mathematics > Animals > Mammals > Carnivores > Dog Family), or in 636.7 (Technology > Argiculture and related technologies > Animal husbandry > Dogs), depending on whether it was a book about the physiology of dogs or on keeping dogs as pets.

Of course, sometimes a work covers multiple topics or even multiple disciplines. The DDC has rules which dictate how these situations should be handled. (To continue the dog example, if you look in 599.77, there is a note which says “class interdisciplinary works on dogs in 636.7,” so if a work covered both the biology and raising of dogs, it should be classed in 636.7).

If you were to buy a hard copy of the DDC, you’d notice that there are a few different parts. The main part that people think of as the DDC is called the “schedules.” This is the big list of all the top-level numbers, arranged into chapters for each main class. There’s also an introduction, which has rules for deciding where works should be classed. For example the rule of fuller treatment says that if a work covers two or more topics, but covers one topic more fully than all the others, the work should be classed under that topic. There’s also the rule of two, which states that if a work covers two topics fairly equally, it should be classed under the lower number. For example, a work on with equal treatment of French bulldogs (636.72) and Welsh corgis (636.737) should be classed under the lower number, 636.72.

In addition to the introduction and the schedules, there’s also the manual which helps you resolve some specific situations (usually you’ll see a note in the schedules like “See manual 636.72-636.75” that points you to go there), the relative index, and the tables. The relative index is generally the starting point for classifying a work. You can look up a topic alphabetically, and you’ll be pointed to all the different possible classifications. And finally, the tables, which help classify works more specifically.

This introduces a topic called “number building.” The DDC doesn’t actually contain a specific entry for each possible topic, but relies on adding standard subdivisions to numbers listed in the schedules. Table 1 contains the standard subdivisions, which you can add as a suffix to pretty much any number you find in the schedules. The standard subdivisions include:

—01	Philosophy and theory
—02	Miscellany
—03	Dictionaries, encyclopedias, concordances
—04	Special topics
—05	Serial publications
—06	Organizations and management
—07	Education, research, and related topics
—08	Groups of people
—09	History, geographic treatment, biography

For example, an encyclopedia of programming languages could be classed under 005 (Computer programming, programs, data), .1 (programming), 3 (programming languages), —03 (Dictionaries, encyclopedias, concordances) to yield the number 005.1303. The book Cracking the Coding Interview, which is about the job of programming , could be classed under 005.1023, again using 005.1 (programming) and adding —023, the standard subdivision for “the subject as a profession, occupation, hobby.”

There are four tables in total; table 2 is used in conjunction with the 09 standard subdivision from table 1, e.g. a book about the architecture of Boston might be classed under 720.9744, with 720 being the architecture, 09 being the standard subdivision for geographic treatment, and 744 being the suffix from table 2 for Massachusetts. You might expect the number to be 720.09744, and it would be, except that in the schedules under 720, we are instructed to put the standard subdivisions in .1 through .9.

Table 3 contains subdivisions for literatures and literary forms and is only used with the main class 800 Literature. For example, a collection of American plays might be classed as 81 (American literature in English) + 3 (the subdivision from table 3 for Drama) to get 813 as the result.

Finally, table 4 contains subdivisions for languages, and is only used with the 400 Language main class. It’s used to break down specific attributes of language, such as —3 for dictionaries. So Webster’s dictionary would be classed as 420 (English and Old English) + 3 (dictionaries) = 423.

There’s a lot to the system, and while there is still a lot I don’t know, I now know a lot more about how it works than I did previously! If I got something wrong here, please email me about it! I’d love to learn more.

Solving the SQL Murder Mystery

2019-12-20T00:00:00Z

I saw this SQL Murder Mystery appear on Hacker News recently, thought it sounded fun, and figured I’d do a quick write-up of how I worked through it.

If you want to follow along, go ahead and download the SQLite database (which is copyright NUKnightLab and redistributed here under the MIT license). You’ll need some kind of SQLite client to interact with it (I just used the sqlite3 CLI tool).

In addition to the database, it’s very helpful to start with a prompt:

A crime has taken place and the detective needs your help. The detective gave you the crime scene report, but you somehow lost it. You vaguely remember that the crime was a murder that occurred sometime on Jan. 15, 2018 and that it took place in SQL City. Start by retrieving the corresponding crime scene report from the police department’s database. If you want to get the most out of this mystery, try to work through it only using your SQL environment and refrain from using a notepad.

Let’s start by seeing what tables are available. The sqlite3 CLI uses meta-commands that start with a dot, like this:

sqlite> .tables
crime_scene_report      get_fit_now_check_in    interview
drivers_license         get_fit_now_member      person
facebook_event_checkin  income                  solution

Okay, let’s start with finding our crime scene report. First, we’ll need to know what the data looks like. We can learn about this with the .schema command:

sqlite> .schema crime_scene_report
CREATE TABLE crime_scene_report (
        date integer,
        type text,
        description text,
        city text
    );

Okay, seems pretty straightforward. The only thing I’m not quite sure about is how the date is being represented -- it’s just stored as an integer. A UNIX timestamp perhaps? Let’s sample the data:

sqlite> select date from crime_scene_report limit 5;

date
20180115
20180115
20180115
20180215
20180215

Okay, seems it’s just being stored as YYYYMMDD. Let’s take a crack at finding the crime scene report! We know the type (murder) and the city (SQL City). Let’s be generous with the date and assume it was sometime in January of 2018:

sqlite> select * from crime_scene_report
   ...> where type = 'murder'
   ...> and city = 'SQL City'
   ...> and date between 20180101 and 20180131;

date	type	description	city
20180115	murder	Security footage shows that there were 2 witnesses. The first witness lives at the last house on "Northwestern Dr". The second witness, named Annabel, lives somewhere on "Franklin Ave".	SQL City

Great, there’s only one row that matches our broad date criteria! Let’s see if we can track down these witnesses. First, let’s see how the data we need is structured:

sqlite> .schema person
CREATE TABLE person (
        id integer PRIMARY KEY,
        name text,
        license_id integer,
        address_number integer,
        address_street_name text,
        ssn integer,
        FOREIGN KEY (license_id) REFERENCES drivers_license(id)
    );
sqlite> .schema interview
CREATE TABLE interview (
        person_id integer,
        transcript text,
        FOREIGN KEY (person_id) REFERENCES person(id)
    );

Okay, so we need to find the two rows in the person table, and then use their ids to cross reference their interview text. This is “the big idea” with relational databases, joining data in several tables based on something they have in common.

We’ll start with the witness who lives on Northwestern Drive. We know that they live in “the last house,” which presumably has the highest house number on that street. We can easily find this by first filtering for only people who live on Northwestern Drive, then ordering those results by house number in descending order, and only showing the first result:

sqlite> select * from person
   ...> where address_street_name = 'Northwestern Dr'
   ...> order by address_number desc
   ...> limit 1;

id	name	license_id	address_number	address_street_name	ssn
14887	Morty Schapiro	118009	4919	Northwestern Dr	111564949

Great! Now let’s find Annabel. We can use SQL’s LIKE operator to match a partial name, along with the name of their street:

sqlite> select * from person
   ...> where name like 'Annabel%'
   ...> and address_street_name = 'Franklin Ave';

id	name	license_id	address_number	address_street_name	ssn
16371	Annabel Miller	490173	103	Franklin Ave	318771143

Okay, so we’ve got our person IDs: 14887 and 16371. I think we’re going to want these IDs in a bunch of upcoming queries, so let’s help our future selves out by saving their IDs as parameters (a sort of temporary variable):

sqlite> .parameter set $MORTY 14887
sqlite> .parameter set $ANNABEL 16371

Let’s grab their interviews. To do this, we’ll put joins to use for the first time so we can show their name rather than just their person ID. We’re selecting records from the interview table, but joining matching records from the person table, using the person_id column to match up the people.

sqlite> select person.name, interview.transcript
   ...> from interview
   ...> join person on person.id = interview.person_id
   ...> where person_id in ($MORTY, $ANNABEL);

name	transcript
Morty Schapiro	I heard a gunshot and then saw a man run out. He had a "Get Fit Now Gym" bag. The membership number on the bag started with "48Z". Only gold members have those bags. The man got into a car with a plate that included "H42W".
Annabel Miller	I saw the murder happen, and I recognized the killer from my gym when I was working out last week on January the 9th.

Okay, we’ve got tons of info now! Since the car and bag might not belong to the killer, I think our best lead for narrowing things down is to see all the people who crossed paths with Annabel at the gym on January 9th, 2018. Let’s see what those tables look like:

sqlite> .schema get_fit_now_check_in
CREATE TABLE get_fit_now_check_in (
        membership_id text,
        check_in_date integer,
        check_in_time integer,
        check_out_time integer,
        FOREIGN KEY (membership_id) REFERENCES get_fit_now_member(id)
    );
sqlite> .schema get_fit_now_member
CREATE TABLE get_fit_now_member (
        id text PRIMARY KEY,
        person_id integer,
        name text,
        membership_start_date integer,
        membership_status text,
        FOREIGN KEY (person_id) REFERENCES person(id)
    );

Alright, time to look for some check-ins! We could do this in two separate queries, one to find Annabel’s Get Fit Now member ID by using her person_id, and a second query to find her check-ins using her membership_id, but we can also use a sub-query to do this in one shot:

sqlite> select check_in_time, check_out_time
   ...> from get_fit_now_check_in
   ...> where date = 20180109
   ...> and membership_id = (
   ...>   select id
   ...>   from get_fit_now_member
   ...>   where person_id = $ANNABEL);

check_in_time	check_out_time
1600	1700

Looks like Annabel was at the gym from 4pm to 5pm on the 9th. Since we’re looking for someone who overlapped with Annabel at the gym, we’re looking for someone who arrived before 5pm and left after 4pm. Again, we’ll join some tables together here so we can grab their names and person IDs right away, not just their membership numbers:

sqlite> select person.id, person.name, get_fit_now_member.id,
   ...>   get_fit_now_check_in.check_in_time,
   ...>   get_fit_now_check_in.check_out_time
   ...> from get_fit_now_check_in
   ...> join get_fit_now_member on get_fit_now_member.id = membership_id
   ...> join person on person.id = person_id
   ...> where check_in_date = 20180109
   ...> and check_in_time <= 1700 and check_out_time >= 1600;

id	name	id	check_in_time	check_out_time
28819	Joe Germuska	48Z7A	1600	1730
67318	Jeremy Bowers	48Z55	1530	1700
16371	Annabel Miller	90081	1600	1700

Interesting, there were only two other gym members who were checked in for a period overlapping with Annabel on the 9th. Let’s save their IDs as well:

sqlite> .parameter set $JOE 28819
sqlite> .parameter set $JEREMY 67318

Their member numbers both start with 48Z; let’s take a look at their vehicles, presumably in the drivers_license table:

sqlite> .schema drivers_license
CREATE TABLE drivers_license (
        id integer PRIMARY KEY,
        age integer,
        height integer,
        eye_color text,
        hair_color text,
        gender text,
        plate_number text,
        car_make text,
        car_model text
    );

sqlite> select person.id, person.name, drivers_license.*
   ...> from person
   ...> join drivers_license on drivers_license.id = person.license_id
   ...> where person.id in ($JOE, $JEREMY);

id	name	id	age	height	eye_color	hair_color	gender	plate_number	car_make	car_model
67318	Jeremy Bowers	423327	30	70	brown	brown	male	0H42W2	Chevrolet	Spark LS

So only Jeremy Bowers has a drivers license. And his car’s license plate does contain H42W, so it looks like we’ve found the killer! According to the instructions in the GitHub repository, we should insert our answer into the solution table, then query it:

sqlite> insert into solution values (1, 'Jeremy Bowers');
sqlite> select value from solution;

value
Congrats, you found the murderer! But wait, there's more... If you think you're up for a challenge, try querying the interview transcript of the murderer to find the real villian behind this crime. If you feel especially confident in your SQL skills, try to complete this final step with no more than 2 queries.

Aha! We did correctly identify Jeremy Bowers. Let’s see if we can connect the dots to find the mastermind! First, we’ll grab the killer’s (Jeremy’s) interview transcript:

sqlite> select transcript from interview where person_id = $JEREMY;

transcript
I was hired by a woman with a lot of money. I don't know her name but I know she's around 5'5" (65") or 5'7" (67"). She has red hair and she drives a Tesla Model S. I know that she attended the SQL Symphony Concert 3 times in December 2017.

Alright, there goes one query... one more to make it count! We’re going to be correlating data from a bunch of tables here: person, income (related by SSN, probably as a sort criterion since we don’t have an exact figure to work with), we can grab height, hair color, gender, and car make/model from the drivers licenses. It’s a bit of a risk to filter by Facebook checkins to the SQL Symphony, since we don’t know that she checked in at all, but maybe we can include the count of the number of times there was a check-in at the symphony during December. Let’s get a reminder of what these tables look like:

sqlite> .schema person
CREATE TABLE person (
        id integer PRIMARY KEY,
        name text,
        license_id integer,
        address_number integer,
        address_street_name text,
        ssn integer,
        FOREIGN KEY (license_id) REFERENCES drivers_license(id)
    );
sqlite> .schema income
CREATE TABLE income (
        ssn integer PRIMARY KEY,
        annual_income integer
    );
sqlite> .schema facebook_event_checkin
CREATE TABLE facebook_event_checkin (
        person_id integer,
        event_id integer,
        event_name text,
        date integer,
        FOREIGN KEY (person_id) REFERENCES person(id)
    );
sqlite> .schema drivers_license
CREATE TABLE drivers_license (
        id integer PRIMARY KEY,
        age integer,
        height integer,
        eye_color text,
        hair_color text,
        gender text,
        plate_number text,
        car_make text,
        car_model text
    );

And assemble our final mega-query!

sqlite> select p.id, p.name, i.annual_income, dl.height, dl.hair_color,
   ...> dl.gender, dl.car_make, dl.car_model, (
   ...>   select count(*)
   ...>   from facebook_event_checkin
   ...>   where person_id = p.id
   ...>   and event_name like '%symphony%'
   ...>   and date between 20171201 and 20171231) as num_symphonies
   ...> from person p
   ...> join income i on i.ssn = p.ssn
   ...> join drivers_license dl on dl.id = p.license_id
   ...> where dl.height between 64 and 68
   ...> and dl.hair_color like '%red%'
   ...> and car_make like '%tesla%'
   ...> and car_model like '%s%'
   ...> order by i.annual_income desc;

id	name	annual_income	height	hair_color	gender	car_make	car_model	num_symphonies
99716	Miranda Priestly	310000	66	red	female	Tesla	Model S	3
78881	Red Korb	278000	65	red	female	Tesla	Model S	0

Okay, so we actually got two results for red-haired people around 66" tall who make a lot of money and drive Tesla Model S’s. However, one of them attended the symphony three times in December (and makes even more money), so I think we’ve found the mastermind!

I didn’t include gender in the filter as I wasn’t sure how the data looked, and I technically would’ve needed an additional query to discover that.

sqlite> insert into solution values (1, 'Miranda Priestly');
sqlite> select value from solution;

value
Congrats, you found the brains behind the murder! Everyone in SQL City hails you as the greatest SQL detective of all time. Time to break out the champagne!

Hooray! I had a lot of fun playing through this, and would love to do another similar puzzle again sometime.

(Almost) Pure CSS Material-like Text Fields

2019-09-19T00:00:00Z

Despite what you may believe from simply looking at this site, I’ve actually done quite a bit of front-end development. A couple of years ago, I worked on a project with a friend of mine. For part of the project, he’d designed the behavior of a form control inspired by Material Design which I then built from scratch. Recently, he asked me to remind him how I’d implemented it, and I thought I’d take the opportunity to turn it into a blog post.

Here’s what it looks like:

First Name

It’s not quite pure CSS, but it’s pretty close. Let’s think about how this is put together.

At a high level, the appearance of the text field at any given moment is the result of two CSS classes, focused and populated, being added and removed via JavaScript. On this page, I’ve simply written a few lines of code to add and remove them at the proper times, but in practice this is probably best done through your frontend JavaScript framework (Angular/React/Vue/...), if you’re using one.

First, let’s talk about the moving placeholder. While CSS does have a ::placeholder pseudo-element that we can use for styling how the placeholder attribute of the is displayed, unfortunately we can’t use it here because we want the placeholder to remain visible while the user edits the field, and the browser-supplied placeholder vanishes when the field isn’t empty.

Another semantically-useful way to display this is the element, so that’s what I’ve used. The label is absolutely positioned to appear over the where you’d expect the placeholder. So our basic markup looks like this:


  
    First Name

When the populated class is applied to the form-group div, an extra CSS rule gets applied to the control-label, changing its position, size, and color. CSS transitions are used to gently animate the movement.

The next interesting element is the heavy bottom border. It would be nice if we could simply use border-bottom on the , but we want to animate it collapsing and expanding, and that wouldn’t be possible using border-bottom without also collapsing and expanding the content of the text input, which we definitely don’t want.

The solution I came up with was to use the ::after pseudo-element to just display a block of color. At rest, it has width: 0, but when the focused class is applied to the containing form-group, then it gets width: 100% and is again animated using CSS transitions.

This is annoyingly close to pure CSS. There are some hacks that can get even closer to being pure CSS, like using the CSS sibling combinator ~ to write rules like

.form-control:focus ~ .control-label {
  /* the control is focused, move the label to the top */
}

but the ultimate stumbling block is that there’s no way to use the current value of the text input in a CSS rule, so we can’t make the label disappear when the input is blurred and non-empty. You can of course use an attribute selector in your CSS like input:not([value='']), but this only considers the actual original attribute value, not whatever it might get changed to by the user later on. You could of course write some JavaScript to make that happen, but if you’ve resorted to JavaScript then you may as well just use the easier and cleaner approach that toggles the classes.

There is one way I thought of that could work to do a pure CSS implementation. There’s a :valid pseudo-class that considers the HTML form validation state. If we make the only valid when it is non-empty, either with the pattern or required attributes, then we could write a rule like

.form-control:not(:focus):valid ~ .control-label {
  /* the control is blurred and has a value, hide the label */
}

However, :valid isn’t supported in all browsers, and this presumes you aren’t using the HTML form validation for anything else, so it’s a little too hacky to rely on. In our case, we were already using React, so adding and removing the classes with JavaScript ended up being quite easy.

Check out the source code for this page to get the code, I promise it’s easy to understand!

Buzzword-Driven “Pop Infosec”

2019-08-06T00:00:00Z

Information security is complicated. When you combine that with the fact that an increasing number of people seem to also consider it to be very important, the result is something I like to call “pop infosec.”

As in pop science or popular psychology, making information security accessible often involves simplifying concepts to improve their general palatability which results in laypeople overestimating their confidence. This “easiness effect” has been studied in the context of science communication, and likely applies to information security in a parallel sense.

While helping people protect themselves from security threats is certainly laudable, it’s important to do it responsibly in order to maximize benefit and minimize harm. Unfortunately, a few recent events I’ve noticed personally suggest that this is not happening.

“The Cloud”

I recently read (part of) an article in the Wall Street Journal (before I got cut off by their paywall) about a data breach which read:

The data was stored on Amazon.com Inc.’s cloud, according to a federal criminal complaint and people familiar with the matter. The avenue of entry, the companies and investigators said, was a poorly configured firewall [...]

Both companies say controls around the data, rather than use of the cloud, were the problem. Still, the data was stored in the cloud, raising questions about whether Capital One put insufficient safeguards in place to lock down customer records when it adopted cloud technology.

Clearly, the reporter has decided to inject some good old “ZOMG all ur dataz are in teh cloud” fear mongering. That aside, this is some of the worst analysis I’ve seen. Imagine you’re trying to keep a box of papers safe; the problem isn’t you kept the box in a self storage unit instead of in your house, the problem is that you left the door unlocked. If the company had a poorly configured cloud environment, why should I expect them to properly configure a firewall in some other environment?

In other words, the WSJ has this almost right: it does raise questions about whether sufficient safeguards were in place, but these questions are orthogonal to any particular technologies or events.

This is simply confusion of correlation and causation. To cite a common example, suppose you thought drowning deaths were a large problem and you learned that there was a strong correlation between ice cream sales and drowning deaths. Recognizing that swimming and eating ice cream are simply both summertime activities, one would of course be mistaken to conclude that banning ice cream would reduce the number of drowning deaths. Likewise, as more companies start using cloud services, we should certainly not be surprised that more vulnerabilities affecting cloud services are discovered.

For the record, I certainly do not believe that “the cloud” is a panacea, but that security is only meaningful relative to a threat model which may or may not involve where hardware happens to be physically located.

“High Severity Vulnerability”

Apparently, all that needs to happen for lots of time and energy to be wasted and have a big fuss is to label something as “high severity.”

Consider this notice I saw when I logged on to GitHub one day:

Clicking “See security alert” lead me to the following notice:

I looked up CVE-2019-10742 and quickly located the relevant pull request for axios. To save you some clicks, axios is a JavaScript HTTP client library which includes an API like this:

axios
  .get('http://example.com/evil.txt')
  .then(console.log)
  .catch(console.error);

Optionally, you can use the get API like this:

axios
  .get('http://example.com/evil.txt', { maxContentLength: 100 })
  .then(console.log)
  .catch(console.error);

in which case axios is expected to abort the response and reject the promise after more than 100 bytes have been received. However, there was a bug in the implementation where the promise would be rejected but reading from the stream would continue, hence the CVE. But look at the code snippets above! This CVE only applies to codebases which actually use the maxContentLength option! If you weren’t using maxContentLength, you weren’t expecting any responses to be truncated in the first place. Nonetheless, I found lots of comments like

will need to roll out a fix for compliance asap

When will this issue be fixed? I have received tons of mail from github regarding axios.

I can help work on it if needed, but we would need to get rid of axios otherwise on an open source SDK I’m actively maintaining

we really need to get a fix out, especially seeing as we’re now getting Github notifications on this.

Thanks to the way GitHub shows references from other issues/pull requests, I was also able to see how people were responding to the vulnerability alert within their own code. Of the random sampling of projects with linked issues/PRs I audited, none of them actually used the maxContentLength option, but dutifully updated the version of their axios dependency and considered the issue resolved.

In reality, nothing about these projects’ security posture actually changed though their maintainers may have thought they did. The real resolution for many of these projects would be to first consider the impact if maxContentLength was not set or respected, and if appropriate, update the dependency and actually use maxContentLength.

Of course, this is not the fault of the developers. Collectively, one of the biggest things we tell people about protecting themselves from vulnerabilities is to keep their software up to date. In this case, developers saw a helpful message saying to update their dependencies, they updated them (possibly even with the automatic click of a button!), and they still might have been vulnerable.

In Conclusion

Information security professionals need to be judicious about how and what is communicated with or recommended to the public. As we’ve seen, “pop infosec” can be ineffective or even harmful. And journalists need to ensure that their reporting is consistent with evidence-based research.

I have said before that security is not a checklist, it is a mindset. You can’t “be secure” by following some steps you find on line or by avoiding certain technologies. The most effective way to improve your security posture is to hire smart people to think critically about your environment.

Vim vs Neovim on FreeBSD

2019-07-22T00:00:00Z

I have a FreeBSD server which primarily serves as a jail host. As such, I’d like to keep its installed packages to a minimum. FreeBSD’s default install comes with vi, but not vim. Using vi feels familiar enough, but it becomes annoying not to have things like gg available. So I decided to install vim to make my life a little nicer:

$ sudo pkg install vim
Updating FreeBSD repository catalogue...
FreeBSD repository is up to date.
All repositories are up to date.
The following 103 package(s) will be affected (of 0 checked):

New packages to be INSTALLED:
        vim: 8.1.1439
        libXpm: 3.5.12_2
        libXext: 1.3.4,1
        libXau: 1.0.9
        libX11: 1.6.8,1
        libxcb: 1.13.1
        libXdmcp: 1.1.3
        xorgproto: 2019.1
        libxml2: 2.9.9
        libpthread-stubs: 0.4
        libXt: 1.2.0,1
        libSM: 1.2.3,1
        libICE: 1.0.9_3,1
        pango: 1.42.4_2
        libXrender: 0.9.10_2
        xorg-fonts-truetype: 7.7_1
        font-misc-meltho: 1.0.3_4
        mkfontscale: 1.2.1
        libfontenc: 1.1.4
        freetype2: 2.10.0
        fontconfig: 2.12.6,1
        font-misc-ethiopic: 1.0.3_4
        font-bh-ttf: 1.0.3_4
        encodings: 1.0.5,1
        font-util: 1.3.1
        dejavu: 2.37_1
        libXft: 2.3.2_3
        harfbuzz: 2.5.3
        graphite2: 1.3.13
        cairo: 1.16.0,2
        pixman: 0.34.0_1
        png: 1.6.37
        mesa-libs: 18.3.2_1
        libxshmfence: 1.3
        libXxf86vm: 1.1.4_3
        libXfixes: 5.0.3_2
        libXdamage: 1.1.5
        wayland: 1.16.0_1
        libepoll-shim: 0.0.20190311
        libdrm: 2.4.98_1,1
        libpciaccess: 0.14
        pciids: 20190620
        libunwind: 20170615
        glib: 2.56.3_5,1
        xkeyboard-config: 2.27
        libXrandr: 1.5.2
        libedit: 3.1.20190324,1
        libepoxy: 1.5.2
        fribidi: 0.19.7
        gtk3: 3.24.9
        libxkbcommon: 0.8.4
        libXinerama: 1.1.4_2,1
        libXi: 1.7.10,1
        libXcursor: 1.2.0
        libXcomposite: 0.4.5,1
        adwaita-icon-theme: 3.28.0
        gtk-update-icon-cache: 2.24.32
        shared-mime-info: 1.10_1
        hicolor-icon-theme: 0.17
        gdk-pixbuf2: 2.36.12
        tiff: 4.0.10_1
        jpeg-turbo: 2.0.2
        jbigkit: 2.1_1
        atk: 2.28.1
        cups: 2.2.11
        gnutls: 3.6.8
        trousers: 0.3.14_2
        tpm-emulator: 0.7.4_2
        gmp: 6.1.2_1
        p11-kit: 0.23.16.1
        libtasn1: 4.13_1
        nettle: 3.4.1_1
        libidn2: 2.2.0
        libunistring: 0.9.10_1
        libpaper: 1.1.24.4
        avahi-app: 0.7_2
        gnome_subr: 1.0
        libdaemon: 0.14_1
        gobject-introspection: 1.56.1,1
        dbus-glib: 0.110
        dbus: 1.12.12
        gdbm: 1.18.1_1
        wayland-protocols: 1.17
        librsvg2: 2.40.20
        libcroco: 0.6.12
        libgsf: 1.14.44
        colord: 1.3.5
        polkit: 0.114_2
        spidermonkey52: 52.9.0_3
        nspr: 4.21
        icu: 64.2,1
        sqlite3: 3.28.0
        desktop-file-utils: 0.23
        lcms2: 2.9
        argyllcms: 1.9.2_4
        libXScrnSaver: 1.2.3_2
        at-spi2-atk: 2.26.2
        at-spi2-core: 2.28.0
        libXtst: 1.2.3_2
        ruby: 2.5.5_2,1
        libyaml: 0.2.2
        ctags: 5.8
        cscope: 15.8b_1

Number of packages to be installed: 103

The process will require 517 MiB more space.
96 MiB to be downloaded.

Whoa, what?! Why do I need wayland and gtk for vim? ^C^C^C

$ sudo pkg install neovim
]Updating FreeBSD repository catalogue...
FreeBSD repository is up to date.
All repositories are up to date.
The following 7 package(s) will be affected (of 0 checked):

New packages to be INSTALLED:
        neovim: 0.3.8
        luajit: 2.0.5_3
        unibilium: 2.0.0
        msgpack: 3.2.0
        libvterm: git20161218
        libuv: 1.30.1
        libtermkey: 0.22

Number of packages to be installed: 7

The process will require 28 MiB more space.
5 MiB to be downloaded.

Much more palatable.

FreeBSD Jail Networking Continued

2018-10-13T00:00:00Z

I decided to take another crack at the jail configuration I started in Experiment 1. After reading bits and pieces of a few random websites (including various ServerFault posts), on an inkling I added the line interface = "bge0"; to my /etc/jail.conf file and ran service jail restart www (bge0 is my LAN interface on the host). After jexecing in, I tried pkg install nginx again and it worked like a charm!

I also noticed that when I run ifconfig on my host now, both the original 10.0.2.201 and the jail’s 10.0.2.202 addresses had been added to the bge0 interface. I wondered whether that meant that I could now SSH into the host using the jail’s IP address. So on my laptop, I ran ssh bb@10.0.2.202 and lo and behold, it worked. The opposite, however, is not true: loading http://10.0.2.201 in a web browser does not give me the beautiful “welcome to nginx” page that http://10.0.2.202 has.

I’m sure some trickier stuff will arise when dealing with NAT and multiple interfaces, but for now I’m satisfied that I have a basic understanding of how to set up a service in a jail and expose it to the network.

How does DHCP work?

2018-10-09T00:00:00Z

DHCP (Dynamic Host Configuration Protocol) is an integral part of most networks, from small home network to campuses serving thousands of devices. I recently realized that I didn’t have a solid understanding of how it functions. I knew that DHCP was used to obtain an IP address from a central server when joining a network, but wasn’t clear on how that negotiation takes place. How could a machine without an IP address talk to a server that it didn’t know the address of?

To learn more, I started a Wireshark capture and then connected my computer to a network to see what happened. I immediately discovered that DHCP is part of the Bootstrap Protocol (also known as BOOTP), which is transported over UDP/IP. DHCP servers read and write on port 67, while DHCP clients read and write on port 68. Before the client has acquired an IP address, it uses 0.0.0.0 as the source address for packets it transmits, and addresses its packets to the broadcast address 255.255.255.255.

For the simple case that I examined, I found that there are four messages involved in acquiring an IP address: Discover, Offer, Request, and ACK. At a high level, the client broadcasts a request for an address, a DHCP server responds with an offer, the client makes a request based on the offer it received, and finally the DHCP server acknowledges the request.

Step 1: Discovery

The client sends a UDP broadcast packet from 0.0.0.0:68 to 255.255.255.255:67. This is a BOOTP Discover message that includes details about what information is being requested from the network’s authoritative DHCP server. In the case I observed, the following items were requested:

Subnet mask
Classless static route
Router
DNS server
Domain name
Proxy autodiscovery
LDAP server
NetBIOS Name Server
NetBIOS Node Type

A DHCP lease time of 90 days was requested, and my DHCP client identifier (MAC address) and hostname were also included.

In the case I observed, after the first discovery packet that was transmitted was not responded to with an offer after 1.125 seconds, a second discovery packet was transmitted. Since UDP does not guarantee delivery, it makes sense that a basic replay mechanism would be part of the protocol to handle dropped packets. While TCP uses a sequence number to correctly order packets, BOOTP appears to use a somewhat surprising metric: its header contains a “seconds elapsed” field which was set to 0 for the first discovery packet and 1 for the packet 1.125 seconds later.

Step 2: Offer

The server sends UDP packet from 192.168.1.1:67 to 192.168.1.2:68 containing a DHCP Offer message. There are a few ways we can tell this offer is for us:

The BOOTP Transaction ID field is set to the value that we sent in our Discover packet
The Client MAC address field in the BOOTP message is set to ours
At the Ethernet layer, the destination address is also set to our MAC address

In this offer message, we get the responses to some of the questions we asked in our Discover packet. In this case, we are offered a lease time of 3600 (one hour, much less than our requested 90 days). We are instructed to renew after 30 minutes, rebind after 52 minutes 30 seconds, and given a netmask of 255.255.255.0. We’re also informed of the router/DNS server’s address of 192.168.1.1 and supplied with the domain name home (so our machine’s “FQDN” will be .home).

To figure out the address we have been offered, we can look at either the IP address that the packet was sent to, or we can examine the “Your IP” field in the BOOTP message.

Step 3: Request

Now that we’ve received an offer, we make a request for the offer. This mostly involves reiterating the initial request, again sent from 0.0.0.0:68 to 255.255.255.255:67. Additionally, the message includes a “Requested IP” field that specifies the IP address from the Offer.

Step 4: Acknowledgement

Finally, the DHCP server acknowledges our request. This completes the process of IP address acquisition. The server reiterates the correct parameters it provided in the Offer, including the rebinding and renewal periods, netmask, etc.

Some observations: it makes sense to see UDP used for this protocol rather than TCP since TCP is connection-oriented and we don’t know the address of the server (nor our own address for that matter) at the beginning of this process. It’s also easy to imagine havoc being wreaked on a network by creating a rogue DHCP server that provides fake leases with conflicting IP addresses.

Armed with my basic knowledge of how DHCP functions, I wanted to better understand some of what I had encountered while experimenting. For instance, what is the difference between “rebinding” and “renewal”? What is the reason for using “seconds elapsed” as a kind of sequence number? My next stop to find answers was the IETF RFCs.

As of this writing, there have been three iterations of the DHCP RFC, along with a few other extension/option RFCs. All three were written by Ralph Droms of Bucknell University. The first two (RFC 1531 and RFC 1541) were published in October 1993, and the latest version, RFC 2131, was published in March 1997. For historical context, I wanted to learn what had changed throughout the versions, so I ran $ diff rfc1531.txt rfc1541.txt (this is one of those times that I love having the RFC repository available locally. There don’t seem to be any protocol changes between RFC 1531 and RFC 1541, just a few formatting and phrasing changes. Running diff on RFC 1531 and RFC 2131 produced quite a large output that I was not eager to read through, but conveniently, section 1.1 of RFC 2131 is called “Changes to RFC 1541”. The 1997 changes are described as:

This document updates the DHCP protocol specification that appears in RFC1541. A new DHCP message type, DHCPINFORM, has been added; see section 3.4, 4.3 and 4.4 for details. The classing mechanism for identifying DHCP clients to DHCP servers has been extended to include "vendor" classes as defined in sections 4.2 and 4.3. The minimum lease time restriction has been removed. Finally, many editorial changes have been made to clarify the text as a result of experience gained in DHCP interoperability tests.

Interestingly, the terms we’re used to seeing defined in RFC 2119 (MUST, MUST NOT, REQUIRED, etc) are specifically defined in the document. On closer inspection, RFC 2119 was also published in March 1997!

With regard to my lingering questions, I learned that “renewing” is when a client is attempting to renew its lease by recontacting the server that initially granted it. If the server can’t be contacted, or refuses to renew the lease, the client enters the “rebinding” state in which it tries to contact any DHCP server to renew its lease or obtain a new one.

I was only able to find one mention of an actual use for the “seconds” field (on page 15):

To help ensure that any BOOTP relay agents forward the DHCPREQUEST message to the same set of DHCP servers that received the original DHCPDISCOVER message, the DHCPREQUEST message MUST use the same value in the DHCP message header's 'secs' field and be sent to the same IP broadcast address as the original DHCPDISCOVER message.

I did notice that there are a lot of sections with language like “a DHCP server MAY extend a client’s lease only if it has local administrative authority to do so’ (emphasis added). But what if someone were to put a rogue DHCP server on the network, one that did not have “local administrative authority”? It’s probably quite possible to wreak a bit of havoc by creating a rogue DHCP server, though perhaps not quite as easy as it might seem. Since DHCP leases often last for some time (hours or days), existing clients might not be affected by the appearance of a new server for quite a while. Besides, due to the binding mechanism, when a client needs to renew its lease, it sends a unicast message directly to the server it initially obtained the lease from rather than immediately resorting to broadcasting a DHCPDISCOVER message.

Since DHCP is often employed on a contiguous physical network segment, it may not always be possible to use a firewall to block traffic to the server port (67). This would require some sort of Layer 2 firewall, which I’m sure exists, but doesn’t seem to be widely deployed (or recommended). It would of course be possible to set up rules on a Layer 3/4 firewall to block traffic to port 67 on machines not authorized to act as DHCP servers to prevent a rogue server from having any effect outside its physical segment.

In conclusion:

Wireshark is a great learning tool
RFCs are educational from a technical as well as a historical perspective
Now I know how DHCP works in a bit more depth

FreeBSD Experiment 1: Jails

2018-09-20T00:00:00Z

In my preparations for removing ESXi, I tried creating a simple jail on my test box helios. As part of my purpose is to learn as much as possible, I decided against using a tool like ezjail in favor of doing it “by hand.” While the FreeBSD Handbook has some information on creating jails without using additional tools, pretty much every other document I found suggested using ezjail. There’s a chance I’ll revisit ezjail in the future, as it seems to have some helpful features like having a “base jail” so you only need one copy of the FreeBSD base system, but for now I’d like to do as much as possible without additional tools.

My goal for this experiment was to set up a simple web server (nginx) inside a jail. To start, I edited /etc/jail.conf to contain the following:

www {
  host.hostname = www.local;
  ip4.addr = 10.0.2.202;
  path = "/usr/jail/www";
  exec.start = "/bin/sh /etc/rc";
  exec.stop = "/bin/sh /etc/rc.shutdown";
}

Next, I used bsdinstall(8) to install the base system instead of compiling from source:

root@helios:~ # bsdinstall jail /usr/jail/www

I then added jail_enable="YES" to /etc/rc.conf and started the jail:

root@helios:~ # service jail start www

This took a few seconds to complete, and then the jail showed up when I ran jls:

root@helios:~ # jls
   JID  IP Address      Hostname                      Path
     1  10.0.2.202      www.local                     /usr/jail/www

I was able to enter the jail:

root@helios:~ # jexec www /bin/sh
#

But I seem not to have Internet connectivity, as attempting to use pkg-ng fails:

# pkg install nginx
The package management tool is not yet installed on your system.
Do you want to fetch and install it now? [y/N]: y
Bootstrapping pkg from pkg+http://pkg.FreeBSD.org/FreeBSD:11:amd64/quarterly, please wait...
pkg: Error fetching http://pkg.FreeBSD.org/FreeBSD:11:amd64/quarterly/Latest/pkg.txz: Non-recoverable resolver failure
A pre-built version of pkg could not be found for your system.
Consider changing PACKAGESITE or installing it from ports: 'ports-mgmt/pkg'.

Running ifconfig inside the jail shows that I do not seem to have an IP address, nor can I seem to communicate with any hosts. Interestingly when I attempt to ping my gateway, I get the message:

ping: ssend socket: Operation not permitted

Clearly there’s something I’ve not yet figured out.

Notes on setting up a FreeBSD home server

2018-09-17T00:00:00Z

A few months ago, I purchased a beefy second-hand tower to act as a home server. I was looking to bring some of the services that I was previously outsourcing into a single location, and to expand my familiarity with networking and systems administration. Specifically, I wanted to:

Replace the small DigitalOcean box that I was using as a VPN/proxy when I needed to use public WiFi
Stop paying for a GitHub subscription to host private repositories
Have a better home media and file sharing/backup solution
Host a Minecraft server (nothing too serious, I occasionally play with a few friends)
Have a stable home for various VMs that I spin up as part of my security lab (I’ve been playing around with pen testing and trying to learn more about Windows as a part of this).

My initial solution was to install a free version of VMWare ESXi as a hypervisor and create several virtual machines. It was actually quite easy to get ESXi up and running and start creating VMs. For the past several months, my home network has been completely routed through the server (it has dual Ethernet, so I’m using pfSense in a VM as my firewall/NAT/DHCP/etc), and I’ve spun up several VMs (mostly Ubuntu) for things like Gitlab and Minecraft.

However, there are a few things that I don’t quite like. I did have an incident following a power outage after my free trial of ESXi had expired but before I inputted my free license key in the UI. This resulted in my pfSense VM not auto-booting and due to some poor configuration on my part, I was unable to access the ESXi web UI to enter the license key without resetting the network settings through the ESXi console. This brings me to my second gripe: the ESXi web UI is very buggy and overall pretty awful to use. Certain pages have to be reloaded to work properly, dialogs are randomly empty, etc. Thirdly, I’ve found myself creating a “general purpose” VM that I can SSH into remotely. While there’s nothing explicitly wrong with this, it just doesn’t feel quite right to me to have a general purpose server that is completely parallel to my other server VMs.

As a result of these shortcomings and learnings, I have decided to embark upon a journey towards further simplification and reliability. I’ll be replacing ESXi with FreeBSD, a rock-solid operating system. Rather than running a utility VM, I’ll simply have the FreeBSD system on the server itself as a “base of operations.”

I plan to learn more about and use several tools during this process. Currently, I only have one 2 TB drive installed. I plan to add a second one and use zfs to create a mirrored vdev pool for redundancy. This will make me feel a lot better about using my server as a backup destination. Of course, this in itself is not a complete backup solution, but it’s a significant step forward from just relying on a single disk. Rather than running pfSense in a VM, I plan to just use the ISC DHCP server from the ports collection and use the built-in pf firewall to accomplish just about everything I was using pfSense for. I’ll likely also end up running a BIND DNS server for a few local network things.

I am still learning about jails in FreeBSD, but I think they could replace a few of the VMs I have currently, such as the Minecraft and GitLab servers. I plan to use bhyve to run things like Windows VMs for pen testing that jails are clearly not suited for.

I’ve used FreeBSD as my desktop OS in the past, and really love how it feels compared with GNU/Linux. Everything just seems more straightforward, and I was surprised to find that things like graphics drivers Just Work™ under FreeBSD where they require a lot of ugly finagling under Linux. I’m quite looking forward to using FreeBSD more often frequently, and gaining more depth in some of its great tools like jails and pf.

To start making the transition (which might be a little painful), I’ve installed a fresh copy of FreeBSD 11.2 on a currently-unused machine to start poking around with zfs configurations, jails, and bhyve. This will give me the foundation I need to effectively set up my top-level environment and hopefully get it mostly right the first time. Incidentally, I’m also about half way through reading The Book of PF from No Starch Press, which will no doubt be helpful in my transition from pfSense to pure pf.

I intend to update this page with notes as I continue on my FreeBSD journey. Stay tuned!

Whitelisting Tor on CloudFlare

2016-04-08T00:00:00Z

On March 30th, 2016, CloudFlare posted a blog entry entitled “The Trouble with Tor” outlining the issues Cloudflare has with serving clients’ sites to Tor users. The Tor project quickly followed it up with their own post, “The Trouble with CloudFlare”, which presented an analysis of the situation from Tor’s perspective.

CloudFlare’s post acknowledged that Tor does play an important role on the internet, but presents the irrelevant conclusion that of “Security, Anonymity, Convenience: Pick Any Two,” security and convenience will necessarily be the choices of their customers. Certainly, all three properties are important, but not all of their customers’ sites will be subject to the same risks.

I use CloudFlare’s services on several sites, including this one. On some of my sites, I do rely on CloudFlare to provide some measure of security, particularly ones with dynamic content. However, for a site like this one that is entirely static, I have nothing to gain from hiding my content due to a perceived security threat. Everything on this site is considered public, and there are no attack vectors that are prevented through CloudFlare doing browser verification.

On the other hand, anonymity is quite important to me. Where it does not present a security risk to disable CloudFlare’s browser verification, I have chosen to whitelist Tor users on this site. There is little to be lost from bots or spammers accessing this site at will, and there is much to be gained from ensuring that people who consider their privacy important to be able to access content without undue hinderance.

CloudFlare does provide an easy way to whitelist all Tor traffic, and they even presented it in their original blog post. To whitelist Tor, go to the Firewall app in your CloudFlare dashboard and add an Access Rule. Enter T1 as the country code (the special code for Tor), and select Whitelist as the action. Now, Tor users will not be presented with a CAPTCHA when visiting your site.

To see it in action for yourself, download the Tor browser and try visiting your site before and after adding the firewall rule. More information about how CloudFlare handles Tor traffic can be found on their Help Center page.

While whitelisting Tor is not the right solution for every site, I encourage you to consider whether yours is a good candidate. Let me know your thoughts!

Getting Login to Work on Ubuntu 15.04 with NVIDIA Drivers

2015-04-23T00:00:00Z

When I upgraded to Ubuntu 15.04, I was unable to log in. The machine started normally and I was presented with the login window. But when I entered my password, the screen went black for a few moments and then the login screen came back.

Since I’m using an NVIDIA GeForce GTX 750, which Ubuntu’s Nouveau drivers don’t support, I previously needed to install the NVIDIA graphics drivers.

By entering Ctrl + Alt + F3, I was able to drop to a shell. When I checked /var/log/Xorg.0.log, I found a message stating that the NVIDIA driver had failed to load the GLX module, despite earlier messages that it had been loaded. The message also recommended reinstalling the NVIDIA driver.

In the same shell, I ran:

wget http://us.download.nvidia.com/XFree86/Linux-x86_64/349.16/NVIDIA-Linux-x86_64-349.16.run
chmod u+x NVIDIA-Linux-x86_64-349.16.run
sudo service lightdm stop
sudo ./NVIDIA-Linux-x86_64-349.16.run

After that, restarting my computer cleared up the issue.

How to Reset a Lost Password on a LUKS-Encrypted Disk in Ubuntu Linux

2015-03-28T00:00:00Z

Here’s the situation I recently found myself in:

Ubuntu Linux 14.10
Unknown password for user account
Unknown (but set) root password (Ubuntu’s philosophy is to use sudo for everything)
LUKS encrypted filesystem (known passphrase)
Physical access to the computer

I needed to reset my account password. Normally, with physical access to a machine, all bets are off when it comes to security. I tried booting up the machine into recovery mode by holding down shift as soon as the BIOS had finished loading. But when I selected the “Drop to root shell” option, I was prompted to enter the unknown root password.

My second approach was to boot into single user mode by editing the GRUB command script.

By going down to the recovery mode option and hitting e, you can edit the GRUB commands. By adding init=/bin/bash at the end of the line beginning with linux that specifies the boot image, you can specify an initial shell to use. Then I hit F10 to boot.

After waiting for about 30 seconds or a minute, I saw a message that waiting for the root device (the locked disk) had timed out. I was then dumped into an initramfs shell. From there, I was able to unlock the disk by running cryptsetup luksOpen /dev/sda3 sda3_crypt.

Next, I mounted the freshly-unlocked disk with mount -o rw /dev/sda3 /root, taking advantage of the pre-existing empty directory. From there, I used chroot to run passwd in the OS.

$ chroot /root passwd
$ chroot /root passwd myUserName

By running these commands, I successfully reset both the root password as well as the password for my account. From there, I was able to restart the machine and boot normally.

Your Website is not Special, Don’t Make Visitors Make Accounts

2015-01-16T00:00:00Z

One of my pet peeves in website usability design is forcing people to create unnecessary accounts. My recent purchase of some concert tickets from Ticketfly required me to make an account to buy them. For people who buy a lot of concert tickets, having an account may make a lot of sense. But for me, as someone who buys concert tickets at most once every year or two, having an account on a site that I will probably only use once is not only unnecessary, it’s annoying.

This is not to say that you shouldn’t offer accounts; that would be ridiculous (depending on the type of site you are running, of course). However, in general, your users know far better than you do whether or not they actually want or will use an account. Forcing them to create an account will only drive them away. People don’t like creating accounts they don’t want to have. There’s really no reason you can’t have a “check out as guest” option.

And if you do offer accounts, here are a couple of rules to follow to ensure a good user experience:

Allow the option of using a 3rd-party identity provider (OpenID, Facebook, Google, etc.). Often, visitors don’t want to have yet another username/password to remember.
Don’t force visitors to use a 3rd-party provider. Always have a local option. As a counter point to (1), many visitors won’t want to use their Facebook/Google accounts for authenticating to other sites.
Username = Email. Don’t make people remember a username for your site. You may allow them to pick a username later on that can be used in lieu of their email address, e.g. as the URL for a profile page, but don’t force them to use a username to log in.
Don’t make complicated password rules. If you do have password requirements, show them to the user before they try to make a password. Only telling them when their password doesn’t fit your requirements causes consternation.
Never ever limit how long a password can be (within reason, obviously you don’t want to be receiving a megabyte long password). My bank limits passwords to 14 characters, which is rather absurd. Since you’re hashing your passwords anyway, it’s not like you need to allocate extra memory in your tables to store longer passwords.
Always allow your users to close their account. This should remove all information about them from your service to the extent possible without disrupting the integrity of other information.

Of course, there are technical details that you need to be watching out for that are outside the scope of this post. I’ll leave it to you to make sure your implementation is secure and robust, but I’ll leave you with a few general tips:

Don’t invent your own crypto. This applies to protocols, hashing, encryption, everything.
Use bcrypt.
Using unsecured HTTP (no SSL/TLS) is inexcusable.
Don’t invent your own crypto.
Don’t invent your own crypto.
Use bcrypt.

Using Showoff for Markdown Presentations

2014-12-14T00:00:00Z

Recently, I had to give a presentation and decided to do some research on using Markdown. By coincidence, I had also been looking into Puppet, a flexible and powerful configuration manager, when I stumbled across Showoff, another Puppet Labs project.

Showoff is a Ruby application that takes a Markdown file with some special formatting and transforms it into a web-accessible slideshow. As expected, you can open up a presenter view in your browser. You can also easily open up a second window to use on your projector in full screen. You can even give your audience the address for the server so they can follow along on their own screens.

There are also some nice audience interactivity features, like the ability to ask questions through the web interface. These questions will be shown on the presenter’s screen. Audience members also have the ability to indicate whether the presenter is moving too quickly or too slowly so that an adjustment can be made accordingly.

Finally, Showoff is designed with software presentations in mind, with the ability to dynamically run Ruby, JavaScript, or Coffeescript code included in your slides. You can attach other files or labs to your slides, so audience members following along on their own devices can easily access reference materials at the appropriate time.

For a small presentation like the one I was doing, a lot of the more advanced features of Showoff would have been overkill, but it still made an awesome presentation method. It was also really neat to be able to say that the slides were available on Github if anyone wanted to look at them afterwards.

Configuring CloudFlare’s Universal SSL

2014-10-11T00:00:00Z

On September 29, 2014, CloudFlare, a web security company and CDN provider, announced that they would begin offering free, automatic SSL to all its customers (including those on their free plan). This is an enormous step forward for enhancing security and privacy on the Internet; while website owners would previously need to purchase an SSL certificate for their site and often pay extra for SSL hosting, CloudFlare now makes this all free. Plus, you get the benefits of their other services such as DDoS protection.

I’ve previously written about hosting static sites with GitHub Pages, which is what I use for www.benburwell.com. GitHub provides SSL hosting for its static sites, but not with custom domain names (e.g. https://example.github.io but http://example.com). Using CloudFlare, it’s possible to use https://example.com for free. And as a bonus, you won’t need to worry about DNS hosting either.

What is CloudFlare?

CloudFlare works by having all of the traffic for your site routed through CloudFlare’s network, which provides CDN services such as caching of static resources, as well as security options like DDoS protection and a Web Application Firewall (WAF). You’ll need to import your DNS records to CloudFlare and specify CloudFlare’s DNS servers with your domain registrar to facilitate the service. Other nice features include apex CNAME records using the @ character (traditionally challenging), as well as IPv6 DNS support.

Setting Up Free, Universal SSL with GitHub Pages

(Note: you can really do this with any host, but I’m going to be describing how I did this with my site.)

To get started, head over to CloudFlare and create an account. Next, you’ll specify the website you want to use CloudFlare with (be sure to use your custom DNS name, not you.github.io). You’ll have to wait for a few minutes as CloudFlare scrapes your DNS records. Be sure all of them are there, as any that aren’t will cease to be valid once you enable CloudFlare.

Next, head over to your registrar and change your authoritative name servers to the ones listed in CloudFlare to start routing your traffic through their network. This will take some time to propagate through the DNS network, but should be effective within a few hours. In the meantime, you can take a look at the three Settings pages. There are many options for optimization, redirects, caching, security, and more. The important one is to go down to the SSL option and set it to Flexible SSL. Note that even though you can access your GitHub pages site over SSL, trying to do so with full SSL through CloudFlare will result in an “Unknown Site” error from GitHub.

Update on 22 May, 2015: Since this article was published, CloudFlare has updated their dashboard. Now, the settings for SSL are located under the “Crypto” tab for your website. The page rules as described below are still configured the same way, but now found under the “Page Rules” tab.

On the free tier, CloudFlare states that it will take up to 24 hours to provision the SSL certificate for your site. In my case, it only took a few hours. Using one of their paid plans will result in immediate provision. You can check in on whether the certificate has been provisioned by trying to navigate to https://yoursite.com. You’ll likely get a domain mismatch SSL error as CloudFlare defaults to a different certificate until yours has been provisioned. Once you stop receiving the error, you’re good to go!

The final step is to set up Page Rules (of which you get three for free) to redirect visitors to the non-secure site to the SSL one. Go to My Websites and click Page Rules under the gear icon. Enter the URL patterns to match and flip the “Always use https” to ON.

That’s it! You’ve taken an important step towards making the web browsing gxperience more secure and private for your visitors.

LESS File Compilation for Jekyll and GitHub Pages

2014-05-31T00:00:00Z

I recently wrote about migrating my website to GitHub Pages and noted that I wasn’t completely satisfied with my deployment workflow. Ideally, creating a build should be done in a single step. As I wrote, my previous build workflow required me to manually compile my LESS files before committing if I’d made changes. While my stylesheet doesn’t change often, this method is certainly not ideal.

Using Git hooks, it’s possible to run a script at certain points during the Git workflow. To take advantage of this in my case, I added a small bash script to .git/hooks/pre-commit:

#!/bin/sh

export PATH=/usr/local/bin:$PATH
cd /Users/Ben/Documents/Code/benburwell.github.io/assets/less
lessc --clean-css style.less ../css/style.css
cd /Users/Ben/Documents/Code/benburwell.github.io
git add /Users/Ben/Documents/Code/benburwell.github.io/assets/css/style.css

This is a pretty rough script, but it gets the job done for me. For a much more thorough script, see this article by TJ VanToll.

Enhancing Printing at Muhlenberg

2014-05-03T00:00:00Z

A common frustration of Muhlenberg students is to print a document to a dorm printer only to find that the printer had no paper when going to collect it. This leads to both frustration and wasted paper, since when more paper is put into the printer, it will print out all the queued jobs from when the tray was empty. By that time, students have often given up and printed their document to another printer.

To avoid this, I created a web page that reports the status of Muhlenberg printers. The PHP script queries the printers to determine the status of their trays. If you’d like to see other printers added, let me know by email or on Twitter.

DNS Names

To facilitate printing from personal computers, I created DNS records for several printers which enable them to be configured with a logical name rather than by IP address. Currently, the following printers/DNS names are available:

trumbower48.print.muhlenberg.benburwell.com
trumbower125.print.muhlenberg.benburwell.com
trumbower147.print.muhlenberg.benburwell.com

Migrating to GitHub Pages and Jekyll

2014-05-01T00:00:00Z

I’ve always been a fan of using Markdown to create web content. Several years ago, I created MDEngine, a small PHP script to render Markdown files in HTML dynamically. For a while, it was responsible for much of the content on my website. In October 2013, I began work on a fresh design. I decided to use a custom Node.js app deployed on Heroku for processing the Markdown. While this worked effectively, I always had some reservations.

While my site was decently fast, there was no real reason that it needed to be dynamically generated. I was particularly concerned with the performance of the two list pages, whose backend logic consisted of parsing an entire directory of Markdown files each time it was loaded. Though there was no noticeable performance impact, it was not inconceivable that the page generation time would increase substantially as content grew.

In late April 2014, I made some design updates to the site running on Heroku. I decided to take the opportunity to address my performance concerns as well. While my original intent was to simply clean up the server logic I had written, I realized that it would be more sustainable in the long term to migrate to a true static site using Jekyll.

The Setup

Installing Jekyll locally was a piece of cake; simply running gem install jekyll did the trick. I already had a placeholder page in my benburwell.github.io repo, so I cd’d to the parent directory and ran jekyll new benburwell.github.io to overwrite the old content.

For those unfamiliar with GitHub Pages, anything that you put in a repo named [your username].github.io will automatically be served from that URL. You can also create branches named gh-pages in your other repos to serve project-specific sites. In addition to serving static content, GitHub Pages will automatically compile sites generated with Jekyll.

Porting Content

Next came what was probably the most time-consuming part of the whole process: converting the Jade layout into pure HTML with Liquid markup. Luckily, this wasn’t too painful, and I came out with two layouts, page structure and navigation, and the other for displaying Posts.

My next challenge was to maintain my link structure so nothing would be broken. The one exception I conceded to was my résumé, a PDF file that I had been serving from /resume/ using Express (admittedly a pretty poor idea). After exploring the Jekyll documentation, I discovered that an easy way to separate out my content into Writing and Projects as I’ve done on my site was to use the built-in category functionality. I would simply create two category pages at /writing/index.html and /projects/index.html to render a list of posts from their respective categories, and tag each Markdown document with the appropriate category. The final step was to define my permalink structure in _config.yml which I did by adding permalink: /:categories/:title/ to the file.

I next had the pleasure of renaming all of my content files to adhere to Jekyll’s naming convention (YYYY-MM-DD-hyphen-separated-title.markdown) and adding/modifying the front matter as necessary.

Additional Configuration

I decided to enable the jekyll-sitemap plugin by adding jekyll-sitemap as a gem to _config.yml. This plugin will generate an XML sitemap that can be used by crawlers such as those run by search engines to help determine what content needs to be indexed.

I moved my error page over and quickly translated the Jade to Markdown by following the instructions provided by GitHub for creating a custom 404 page. The only remaining issue was my stylesheet problem. In my Express app, I used Less for writing my stylesheets. As of this writing, Jekyll does not support compiled stylesheet languages like Less, though there is the suggestion of future support for Sass and CoffeeScript.

For now, I’m keeping my stylesheets in /assets/less/ and compiling them down to a CSS file locally after making changes with lessc --clean-css style.less ../css/style.css. While this certainly isn’t perfect, it allows me to keep my Less files intact and to serve minified CSS.

Conclusion

All in all, the process went very smoothly. I made the first Jekyll commit at 18:52 and changed my DNS records from Heroku at 21:20, spending about two and a half hours learning Jekyll and converting my site over. This is a pretty rapid deployment — kudos to Jekyll for building such an easy tool.

As far as the future goes, I’d like to see GitHub pages provide native support for a stylesheet language, be it Less, Sass, or some other one. Additionally, I’d like to see an HTML minification plugin (a minor optimization, but not unreasonable). For the time being, I’m quite happily serving this site with GitHub Pages.

Ben Burwell

Avoid speculative error handling

How I Connect to Postgres Databases

Using .pg_service.conf

Storing passwords in .pgpass

Port forwarding with SSH

Using ~/.ssh/config

Headless ssh with a control socket

Tying it all together

Further Reading

Transactions Are Not Locks

Preventing concurrency bugs

check constraints

Table locks

Row locks

Transaction isolation levels

Flame Graphs for Go With pprof

Profiling our web server

Make it faster!

tl;dr: How to Make a Flame Graph from a pprof source

Contributing to the aerc email client

Maildir backend

:unsubscribe command

Address book integration

Lutron Universal Wireshark

Intercepting Go TLS Connections with Wireshark

How decrypting TLS in Wireshark works

Configuring Go to use a TLS Key File

Debugging HTTP services with mitmproxy

Learning About Syscall Filtering With Seccomp

How to Add Row Level Security to Views in PostgreSQL

MIG welding

Safety

Mechanics

Tack Welds

Beads

Fillets

“Series of tacks”

Fill-in

How the Dewey Decimal Classification Works

Solving the SQL Murder Mystery

(Almost) Pure CSS Material-like Text Fields

Buzzword-Driven “Pop Infosec”

“The Cloud”

“High Severity Vulnerability”

In Conclusion

Vim vs Neovim on FreeBSD

FreeBSD Jail Networking Continued

How does DHCP work?

Step 1: Discovery

Step 2: Offer

Step 3: Request

Step 4: Acknowledgement

FreeBSD Experiment 1: Jails

Notes on setting up a FreeBSD home server

Whitelisting Tor on CloudFlare

Getting Login to Work on Ubuntu 15.04 with NVIDIA Drivers

How to Reset a Lost Password on a LUKS-Encrypted Disk in Ubuntu Linux

Your Website is not Special, Don’t Make Visitors Make Accounts

Using Showoff for Markdown Presentations

Configuring CloudFlare’s Universal SSL

What is CloudFlare?

Setting Up Free, Universal SSL with GitHub Pages

LESS File Compilation for Jekyll and GitHub Pages

Enhancing Printing at Muhlenberg

DNS Names

Migrating to GitHub Pages and Jekyll

The Setup

Porting Content

Additional Configuration

Conclusion

Using `.pg_service.conf`

Storing passwords in `.pgpass`

Using `~/.ssh/config`

`check` constraints

tl;dr: How to Make a Flame Graph from a `pprof` source

`:unsubscribe` command