<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" >
  <link href="https://www.benburwell.com/posts/index.xml" rel="self" type="application/atom+xml" />
  <link href="https://www.benburwell.com/posts/" rel="alternate" type="text/html" />
  <updated>2021-10-29T09:18:55-04:00</updated>
  <id>https://www.benburwell.com/feed.xml</id>
  <title type="html">Ben Burwell</title>
  
  <entry>
    <title type="html">Avoid speculative error handling</title>
    <link href="https://www.benburwell.com/posts/avoid-speculative-error-handling/" rel="alternate" type="text/html" title="Avoid speculative error handling" />
    <published>2023-01-04T00:00:00Z</published>
    <updated>2023-01-04T00:00:00Z</updated>
    <id>https://www.benburwell.com/posts/avoid-speculative-error-handling/</id>
    <content type="html" xml:base="/posts/avoid-speculative-error-handling/">
      <![CDATA[
        <p>A general heuristic I’ve learned from reading, writing, and debugging a lot of
code is <strong>don’t try to predict whether a future operation will fail.</strong> This
comes up a lot when handling error conditions.</p>
<p>For example, I’ve seen plenty of code that reads something like this
(pseudocode):</p>
<pre tabindex="0"><code>if not file_exists(&#34;input&#34;):
  print(&#34;Input file not found&#34;)
else:
  input = open(&#34;input&#34;)

# or...

if not server_online(&#34;192.168.32.7&#34;):
  print(&#34;Server is offline&#34;)
else:
  get_data(&#34;192.168.32.7&#34;)
</code></pre><p>It’s great to show friendly error messages like “Input file not found” or
“Server is offline”, but the order of operations above is buggy!</p>
<p>The problem with the first case is that even if the file exists when it’s first
checked, someone could delete or move it before the second statement executes.
In the second case, a network cable could be unplugged.</p>
<p>It would be better to write the code as:</p>
<pre tabindex="0"><code>try input = open(&#34;input&#34;)
catch FileNotFound:
  print(&#34;Input file not found&#34;)

# and

try get_data(&#34;192.168.32.7&#34;)
catch ServerOffline:
  print(&#34;Server is offline&#34;)
</code></pre><p>Instead of checking whether something might not work before deciding whether to
try at all, it’s usually better to just <em>do the thing</em> with the expectation that
it might not work.</p>
<p>There are situations where it really does make sense to try to catch things
early, but it seems much more common for this type of “speculative error
handling” to cause problems than provide solutions.</p>

      ]]>
    </content>
    <author>
      <name>Ben Burwell</name>
    </author>
    <summary type="html">
      A general heuristic I’ve learned from reading, writing, and debugging a lot of code is don’t try to predict whether a future operation will fail. This comes up a lot when handling error conditions.
For example, I’ve seen plenty of code that reads something like this (pseudocode):
if not file_exists(&#34;input&#34;): print(&#34;Input file not found&#34;) else: input = open(&#34;input&#34;) # or... if not server_online(&#34;192.168.32.7&#34;): print(&#34;Server is offline&#34;) else: get_data(&#34;192.168.32.7&#34;) It’s great to show friendly error messages like “Input file not found” or “Server is offline”, but the order of operations above is buggy!
    </summary>
  </entry>
  
  <entry>
    <title type="html">How I Connect to Postgres Databases</title>
    <link href="https://www.benburwell.com/posts/how-i-connect-to-postgres-databases/" rel="alternate" type="text/html" title="How I Connect to Postgres Databases" />
    <published>2022-12-13T15:22:29Z</published>
    <updated>2022-12-13T15:22:29Z</updated>
    <id>https://www.benburwell.com/posts/how-i-connect-to-postgres-databases/</id>
    <content type="html" xml:base="/posts/how-i-connect-to-postgres-databases/">
      <![CDATA[
        <p>I often need to connect to PostgreSQL databases for projects I'm working on, and
over time I've developed a method that works pretty well for me. It's pretty
specific to how I like to work so I wouldn't recommend it for everyone. But
since some of my coworkers have asked about it, I figured I'd write down the
major pieces of the puzzle so others can adapt any parts they like to their own
workflows.</p>
<p>For starters, I almost exclusively use the <a href="https://www.postgresql.org/docs/current/app-psql.html"><code>psql</code></a> command line client.
If you don't use <code>psql</code>, then most of this is probably not relevant to you.
Otherwise, keep reading!</p>
<p>Throughout this page, I'll pretend that there's a database server that we want
to connect to called <code>db1.internal.net</code> that listens on the default port of
5432.</p>
<h2 id="using-pg_serviceconf">Using <code>.pg_service.conf</code></h2>
<p>When you use <code>psql</code> to connect, you can use a connection URL, CLI flags, or you
can use a series of <code>libpq</code> options:</p>
<pre tabindex="0"><code>$ psql &#39;host=db1.internal.net user=app dbname=db password=sesame port=5432&#39;
</code></pre><p>If you're frequently connecting to the same database, it can be a little
annoying to constantly type in all those parameters or try to find them in your
shell history. To make life easier, you can put the <code>libpq</code> options for your
frequently-used connections into a service file.</p>
<p>By default, <code>libpq</code> tries to load service definitions from <code>~/.pg_service.conf</code>.
This file uses an INI-style format, and can be populated with services like
this:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-ini" data-lang="ini"><span class="line"><span class="cl"><span class="na">[db1]   # &lt;-- name of the service</span>
</span></span><span class="line"><span class="cl"><span class="na">host</span><span class="o">=</span><span class="s">db1.internal.net</span>
</span></span><span class="line"><span class="cl"><span class="na">port</span><span class="o">=</span><span class="s">5432</span>
</span></span><span class="line"><span class="cl"><span class="na">user</span><span class="o">=</span><span class="s">app</span>
</span></span><span class="line"><span class="cl"><span class="na">dbname</span><span class="o">=</span><span class="s">db</span>
</span></span><span class="line"><span class="cl"><span class="na">password</span><span class="o">=</span><span class="s">sesame</span>
</span></span></code></pre></div><p>Once you create this file, you can connect with <code>psql</code> by simply referencing the
service name:</p>
<pre tabindex="0"><code>$ psql &#39;service=db1&#39;
</code></pre><h2 id="storing-passwords-in-pgpass">Storing passwords in <code>.pgpass</code></h2>
<p>If you need to use password authentication for your database, and you don't want
to keep your passwords in <code>~/.pg_service.conf</code>, you can use a separate
<code>~/.pgpass</code> file.</p>
<p>Each line of this password file describes the password to use for a particular
database connection or connections. The entry format is colon-delimited
<code>host:port:database:username:password</code>. For example to connect to our <code>db1</code>
service, you could remove <code>password=sesame</code> from <code>~/.pg_service.conf</code> and
instead add the following line to <code>~/.pgpass</code>:</p>
<pre tabindex="0"><code>db1.internal.net:5432:db:app:sesame
</code></pre><p>You can also use <code>*</code> as a wildcard for any of the fields, e.g.
<code>db1.internal.net:5432:*:app:sesame</code> means to use password <code>sesame</code> to connect
as the <code>app</code> user to <em>any</em> database on <code>db1.internal.net:5432</code>.</p>
<h2 id="port-forwarding-with-ssh">Port forwarding with SSH</h2>
<p>Often, the databases you need to connect to aren't directly available, and you
need to connect through a bastion host of some kind. For example, maybe we can
only connect to <code>db1.internal.net</code> after we SSH into an internal network.</p>
<p>For the sake of example, we'll imagine that there is a server called
<code>ssh.public.net</code> that we can SSH into when we want to connect to our <code>db1</code>
service.</p>
<p>We can forward a local port through a SSH tunnel by passing the <code>-L</code> option to
<code>ssh</code>:</p>
<pre tabindex="0"><code>$ ssh -L 15432:db1.internal.net:5432 ben@ssh.public.net -p 2222
</code></pre><p>This will connect to <code>ssh.public.net</code> on port 2222, and then set up a socket
on your local machine bound to port 15432, and any connections you make to that
port will be forwarded over the SSH channel to <code>db1.internal.net:5432</code> from the
remote machine you're SSH'd into.</p>
<p>This means that we can now connect to <code>db1.internal.net</code> using <code>psql</code> by making
a connection to <code>localhost:15432</code>. This can be wrapped up as an entry in your
<code>~/.pg_service</code> file where instead of listing <code>db1.internal.net:5432</code>, you list
<code>localhost:15432</code>:</p>
<pre tabindex="0"><code>[db1]
host=localhost
port=15432
user=app
dbname=db
password=sesame
</code></pre><h2 id="using-sshconfig">Using <code>~/.ssh/config</code></h2>
<p>Instead of needing to remember to use <code>ben</code> as the username for
<code>ssh.public.net</code>, and that <code>sshd</code> is actually listening on port 2222, you can
add an entry to <code>~/.ssh/config</code> similar to the way the Postgres service file
works:</p>
<pre tabindex="0"><code>Host ssh.public.net
  User ben
  Port 2222
</code></pre><p>Now, you can omit the username and port and simply:</p>
<pre tabindex="0"><code>$ ssh -L 15432:db1.internal.net:5432 ssh.public.net
</code></pre><p>You can actually make the <code>Host</code> label anything you want, it doesn't need to be
the real name of the server. This can be useful if you don't actually have a DNS
name to connect to and you don't want to remember the IP address:</p>
<pre tabindex="0"><code>Host my-internal-net
  User ben
  Port 2222
  HostName 192.0.32.7
</code></pre><h2 id="headless-ssh-with-a-control-socket">Headless ssh with a control socket</h2>
<p>So now we can connect fairly easily to our database:</p>
<ol>
<li>Run <code>ssh -L 15432:db1.internal.net:5432 ssh.public.net</code>.</li>
<li>In a separate window, run <code>psql service=db1</code>.</li>
<li>When you are done with your <code>psql</code> session, use <code>^D</code> to log out from the SSH
connection.</li>
</ol>
<p>This works pretty well, but for frequently used connections, it'd be even nicer
to just have one command to run and not need to deal with multiple shell
sessions.</p>
<p>Luckily, <code>ssh</code> connections can be controlled headlessly through a Unix control
socket. Here's what this looks like:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sh" data-lang="sh"><span class="line"><span class="cl">$ ssh -M -S conn.sock -fnNT -L 15432:db1.internal.net:5432 ssh.public.net
</span></span><span class="line"><span class="cl">$ psql <span class="nv">service</span><span class="o">=</span>db1
</span></span><span class="line"><span class="cl">$ ssh -S conn.sock -O <span class="nb">exit</span> ssh.public.net
</span></span></code></pre></div><p>In the first command, we establish the SSH connection and specify <code>conn.sock</code> as
the control socket for connection sharing. We also use the <code>-f</code> option so that
ssh will go to background just before command execution. (You can read more
about the other options in <a href="https://linux.die.net/man/1/ssh">the <code>ssh(1)</code> manpage</a>, but they basically prevent SSH
from starting an actual console session on the remote host so we're <em>only</em> doing
the port forwarding.)</p>
<p>Once the connection is established, we can run <code>psql</code> as usual, and forward our
Postgres traffic over the established SSH connection.</p>
<p>Finally, when we're done with <code>psql</code>, we can have <code>ssh</code> send the <code>exit</code> control
command over <code>conn.sock</code> to close the SSH connection.</p>
<h2 id="tying-it-all-together">Tying it all together</h2>
<p>I tend to wrap all of this up in a short shell script named something like
<code>db1-psql</code>. The scripts look pretty much like what I described above:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sh" data-lang="sh"><span class="line"><span class="cl"><span class="cp">#!/bin/sh
</span></span></span><span class="line"><span class="cl"><span class="cp"></span>
</span></span><span class="line"><span class="cl"><span class="nv">SOCKET</span><span class="o">=</span>db1-ssh.sock
</span></span><span class="line"><span class="cl"><span class="nv">LOCAL_PORT</span><span class="o">=</span><span class="m">15432</span>
</span></span><span class="line"><span class="cl"><span class="nv">REMOTE_DB_HOST</span><span class="o">=</span>db1.internal.net
</span></span><span class="line"><span class="cl"><span class="nv">REMOTE_DB_PORT</span><span class="o">=</span><span class="m">5432</span>
</span></span><span class="line"><span class="cl"><span class="nv">SSH_HOST</span><span class="o">=</span>ssh.public.net
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">ssh -M -S <span class="s2">&#34;</span><span class="nv">$SOCKET</span><span class="s2">&#34;</span> -fnNT -L <span class="s2">&#34;</span><span class="nv">$LOCAL_PORT</span><span class="s2">:</span><span class="nv">$REMOTE_DB_HOST</span><span class="s2">:</span><span class="nv">$REMOTE_DB_PORT</span><span class="s2">&#34;</span> <span class="s2">&#34;</span><span class="nv">$SSH_HOST</span><span class="s2">&#34;</span>
</span></span><span class="line"><span class="cl">psql <span class="nv">service</span><span class="o">=</span>db1
</span></span><span class="line"><span class="cl">ssh -S <span class="s2">&#34;</span><span class="nv">$SOCKET</span><span class="s2">&#34;</span> -O <span class="nb">exit</span> <span class="s2">&#34;</span><span class="nv">$SSH_HOST</span><span class="s2">&#34;</span>
</span></span></code></pre></div><p>With this in place, and the <code>db1-psql</code> script in my <code>$PATH</code> (usually for me this
means dropping it in <code>~/.bin/</code>), I can connect to the database by simply
running:</p>
<pre tabindex="0"><code>$ db1-psql
</code></pre><p>There are lots of ways to connect to databases, but this is what I've found
works well for me. Feel free to take any bits and pieces of this that you like
and use them in workflow!</p>
<h2 id="further-reading">Further Reading</h2>
<ul>
<li><a href="https://www.postgresql.org/docs/current/libpq-pgservice.html"><code>libpq</code> Connection Service File</a></li>
<li><a href="https://www.postgresql.org/docs/current/libpq-pgpass.html"><code>libpq</code> Password File</a></li>
<li><a href="https://www.postgresql.org/docs/15/libpq-connect.html#LIBPQ-PARAMKEYWORDS"><code>libpq</code> Parameter Key Words (<code>host</code>, <code>user</code>, <code>dbname</code>, etc)</a></li>
<li><a href="https://linux.die.net/man/1/ssh"><code>ssh</code> manual page</a></li>
<li><a href="https://linux.die.net/man/5/ssh_config"><code>ssh</code> Configuration File Format</a></li>
</ul>

      ]]>
    </content>
    <author>
      <name>Ben Burwell</name>
    </author>
    <summary type="html">
      I often need to connect to PostgreSQL databases for projects I'm working on, and over time I've developed a method that works pretty well for me. It's pretty specific to how I like to work so I wouldn't recommend it for everyone. But since some of my coworkers have asked about it, I figured I'd write down the major pieces of the puzzle so others can adapt any parts they like to their own workflows.
    </summary>
  </entry>
  
  <entry>
    <title type="html">Transactions Are Not Locks</title>
    <link href="https://www.benburwell.com/posts/transactions-are-not-locks/" rel="alternate" type="text/html" title="Transactions Are Not Locks" />
    <published>2022-03-29T00:00:00Z</published>
    <updated>2022-03-29T00:00:00Z</updated>
    <id>https://www.benburwell.com/posts/transactions-are-not-locks/</id>
    <content type="html" xml:base="/posts/transactions-are-not-locks/">
      <![CDATA[
        <p>One thing I wish I had understood better earlier on in my experience with
PostgreSQL is how transactions and locks can be used together to provide
serializable logic.</p>
<p>An easy way to illustrate this is with a simple bank account system. Suppose we
create an <code>accounts</code> table and populate it like this:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="cl"><span class="k">create</span><span class="w"> </span><span class="k">table</span><span class="w"> </span><span class="n">accounts</span><span class="w"> </span><span class="p">(</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="n">name</span><span class="w"> </span><span class="nb">text</span><span class="w"> </span><span class="k">primary</span><span class="w"> </span><span class="k">key</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="n">balance</span><span class="w"> </span><span class="nb">int</span><span class="w"> </span><span class="k">not</span><span class="w"> </span><span class="k">null</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="p">);</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="k">insert</span><span class="w"> </span><span class="k">into</span><span class="w"> </span><span class="n">accounts</span><span class="w"> </span><span class="p">(</span><span class="n">name</span><span class="p">,</span><span class="w"> </span><span class="n">balance</span><span class="p">)</span><span class="w"> </span><span class="k">values</span><span class="w"> </span><span class="p">(</span><span class="s1">&#39;A&#39;</span><span class="p">,</span><span class="w"> </span><span class="mi">10</span><span class="p">),</span><span class="w"> </span><span class="p">(</span><span class="s1">&#39;B&#39;</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">);</span><span class="w">
</span></span></span></code></pre></div><p>Now we have two bank accounts, <code>A</code> with a balance of $10, and <code>B</code> with a balance
of $0.</p>
<p>In order to be a <em>useful</em> bank, we want to be able to move money from one
account to another. In pseudocode, the way to move money from one account to
another might look something like:</p>
<pre tabindex="0"><code>function moveMoney(from, to, amount):
  # Start a transaction.
  txn = db.begin()
  # Update the balances.
  txn.execute(&#39;update accounts set balance = balance - $amount where name = $from&#39;)
  txn.execute(&#39;update accounts set balance = balance + $amount where name = $to&#39;)
  # Commit the transaction.
  txn.commit()
</code></pre><p>We use a transaction here to make sure that either both updates succeed, or both
updates fail. In other words, we want to avoid the situation where money is
deducted from <code>A</code> but never deposited to <code>B</code>.</p>
<p>There’s another situation that we might want to avoid in our bank too: we might
want a rule that account balances can never be negative. To enforce this rule,
we can update our <code>moveMoney</code> function:</p>
<pre tabindex="0"><code>function moveMoney(from, to, amount):
  # Moving a negative amount of money from A to B is equivalent to moving the
  # corresponding positive amount from B to A.
  if amount &lt; 0:
    moveMoney(to, from, -1*amount)
    return

  # Start a transaction so that all of our queries/updates succeed or fail as a
  # unit.
  txn = db.begin()

  # Make sure the $from account has a balance of at least $amount.
  currBalance = txn.query(&#39;select balance from accounts where name = $from&#39;)
  if currBalance &lt; amount:
    txn.rollback()
    throw exception

  # Move the money as before.
  txn.execute(&#39;update accounts set balance = balance - $amount where name = $from&#39;)
  txn.execute(&#39;update accounts set balance = balance + $amount where name = $to&#39;)

  # Commit the transaction.
  txn.commit()
</code></pre><p>But there’s a problem with this! Using a transaction only ensures that all of
the writes succeed or fail together, it does <em>not</em> provide any guarantees that
all of the statements in the transaction execute “at the same time” (i.e. the
transactions are not <em>serializable</em>).</p>
<h2 id="preventing-concurrency-bugs">Preventing concurrency bugs</h2>
<p>Let’s simulate two different actors calling <code>moveMoney('A', 'B', 10)</code>
concurrently, again with <code>A</code> having an initial balance of $10 and <code>B</code> having $0:</p>
<table>
<thead>
<tr>
<th>Actor 1</th>
<th>Actor 2</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>begin</code></td>
<td></td>
</tr>
<tr>
<td><code>select balance from accounts where name = 'A'</code></td>
<td></td>
</tr>
<tr>
<td></td>
<td><code>begin</code></td>
</tr>
<tr>
<td></td>
<td><code>select balance from accounts where name = 'A'</code></td>
</tr>
<tr>
<td><code>update accounts set balance = balance - 10 where name = 'A'</code></td>
<td></td>
</tr>
<tr>
<td><code>update accounts set balance = balance + 10 where name = 'B'</code></td>
<td></td>
</tr>
<tr>
<td><code>commit</code></td>
<td></td>
</tr>
<tr>
<td></td>
<td><code>update accounts set balance = balance - 10 where name = 'A'</code></td>
</tr>
<tr>
<td></td>
<td><code>update accounts set balance = balance + 10 where name = 'B'</code></td>
</tr>
<tr>
<td></td>
<td><code>commit</code></td>
</tr>
</tbody>
</table>
<p>Now, if we check the account balances, we can see a problem:</p>
<pre tabindex="0"><code>postgres=# select * from accounts ;
 name | balance
------+---------
 A    |     -10
 B    |      20
</code></pre><p>Both actors read the initial balance as $10, and therefore allowed the
operations to proceed. The transaction is ensuring that $10 is deducted from <code>A</code>
<em>if and only if</em> $10 is deposited into <code>B</code>, but two transactions can still be
reading and making decisions based on the same data concurrently.</p>
<p>(PostgreSQL by default does <em>not</em> allow two transactions to <em>write</em> the same
data concurrently; after Actor 1 updates <code>A</code>’s balance, Actor 2 isn’t able to
update <code>A</code>’s balance until after the first transaction is committed or rolled
back.)</p>
<h2 id="check-constraints"><code>check</code> constraints</h2>
<p>There are a few ways we can fix this. One way would be to add a check
constraint:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="cl"><span class="k">alter</span><span class="w"> </span><span class="k">table</span><span class="w"> </span><span class="n">accounts</span><span class="w"> </span><span class="k">add</span><span class="w"> </span><span class="k">constraint</span><span class="w"> </span><span class="n">nonnegative_balance</span><span class="w"> </span><span class="k">check</span><span class="w"> </span><span class="p">(</span><span class="n">balance</span><span class="w"> </span><span class="o">&gt;=</span><span class="w"> </span><span class="mi">0</span><span class="p">);</span><span class="w">
</span></span></span></code></pre></div><p>With this constraint, Actor 2’s <code>update</code> will fail because the constraint would
be violated. In fact, we would no longer even need to check the previous balance
in our application code at all, because the database itself would ensure no
account’s balance ever goes below zero.</p>
<h2 id="table-locks">Table locks</h2>
<p>Another approach would be to use a lock. Before we start reading or writing data
from the <code>accounts</code> table, we can use a lock to ensure that our transaction has
exclusive access to that table until we roll back or commit:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-diff" data-lang="diff"><span class="line"><span class="cl"> begin;
</span></span><span class="line"><span class="cl"><span class="gi">+lock table accounts;
</span></span></span><span class="line"><span class="cl"><span class="gi"></span> select balance from accounts where name = &#39;A&#39;;
</span></span><span class="line"><span class="cl"> update accounts set balance = balance - 10 where name = &#39;A&#39;;
</span></span><span class="line"><span class="cl"> update accounts set balance = balance + 10 where name = &#39;B&#39;;
</span></span><span class="line"><span class="cl"> commit;
</span></span></code></pre></div><p>The <code>lock table accounts</code> statement will not finish until no other transactions
have any locks on the <code>accounts</code> table, and will prevent all other transactions
from accessing the <code>accounts</code> table until our transaction is committed or rolled
back.</p>
<h2 id="row-locks">Row locks</h2>
<p>Locking the entire accounts table is an effective way to prevent overdrawing an
account, but it also needlessly slows down our banking program. If someone is
trying to move money from <code>A</code> to <code>B</code> while someone else is trying to move money
from <code>B</code> to <code>C</code>, the second person’s transaction won’t be able to start until
the first transaction completes, even though they’re touching different
accounts.</p>
<p>Luckily, rather than acquiring a lock on the entire table, we can just acquire a
lock on the row that we’re deducting money from. To do this, we can use <code>for update</code> at the end of our <code>select</code> statement:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="cl"><span class="k">select</span><span class="w"> </span><span class="n">balance</span><span class="w"> </span><span class="k">from</span><span class="w"> </span><span class="n">accounts</span><span class="w"> </span><span class="k">where</span><span class="w"> </span><span class="n">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;A&#39;</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="k">update</span><span class="p">;</span><span class="w">
</span></span></span></code></pre></div><p>Now, other transactions won’t be able to read this row until our transaction is
committed or rolled back (<code>for update</code> can only be used inside a transaction).</p>
<h2 id="transaction-isolation-levels">Transaction isolation levels</h2>
<p>One other way to ensure that we don’t overdraw an account is to change the
isolation level of the transaction:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="cl"><span class="k">begin</span><span class="w"> </span><span class="k">transaction</span><span class="w"> </span><span class="k">isolation</span><span class="w"> </span><span class="k">level</span><span class="w"> </span><span class="k">serializable</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="k">select</span><span class="w"> </span><span class="n">balance</span><span class="w"> </span><span class="k">from</span><span class="w"> </span><span class="n">accounts</span><span class="w"> </span><span class="k">where</span><span class="w"> </span><span class="n">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;A&#39;</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="k">update</span><span class="w"> </span><span class="n">accounts</span><span class="w"> </span><span class="k">set</span><span class="w"> </span><span class="n">balance</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">balance</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">10</span><span class="w"> </span><span class="k">where</span><span class="w"> </span><span class="n">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;A&#39;</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="k">update</span><span class="w"> </span><span class="n">accounts</span><span class="w"> </span><span class="k">set</span><span class="w"> </span><span class="n">balance</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">balance</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">10</span><span class="w"> </span><span class="k">where</span><span class="w"> </span><span class="n">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;B&#39;</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="k">commit</span><span class="p">;</span><span class="w">
</span></span></span></code></pre></div><p>The PostgreSQL manual has <a href="https://www.postgresql.org/docs/12/sql-set-transaction.html">a good description of <code>serializable</code></a>:</p>
<blockquote>
<p>If a pattern of reads and writes among concurrent serializable transactions
would create a situation which could not have occurred for any serial
(one-at-a-time) execution of those transactions, one of them will be rolled
back with a <code>serialization_failure</code> error.</p>
</blockquote>
<p>Where a row or table lock would prevent a second transaction from reading the
balance until the previous transaction committed, with <code>isolation level serializable</code> the second transaction would immediately fail with an error
message: “could not serialize access due to concurrent update.”</p>
<p>There’s a good explanation of the “serializable” consistency model—and how it
differs from other models—<a href="https://jepsen.io/consistency">on the Jepsen site</a>.</p>

      ]]>
    </content>
    <author>
      <name>Ben Burwell</name>
    </author>
    <summary type="html">
      One thing I wish I had understood better earlier on in my experience with PostgreSQL is how transactions and locks can be used together to provide serializable logic.
An easy way to illustrate this is with a simple bank account system. Suppose we create an accounts table and populate it like this:
create table accounts ( name text primary key, balance int not null ); insert into accounts (name, balance) values (&#39;A&#39;, 10), (&#39;B&#39;, 0); Now we have two bank accounts, A with a balance of $10, and B with a balance of $0.
    </summary>
  </entry>
  
  <entry>
    <title type="html">Flame Graphs for Go With pprof</title>
    <link href="https://www.benburwell.com/posts/flame-graphs-for-go-with-pprof/" rel="alternate" type="text/html" title="Flame Graphs for Go With pprof" />
    <published>2022-03-11T00:00:00Z</published>
    <updated>2022-03-11T00:00:00Z</updated>
    <id>https://www.benburwell.com/posts/flame-graphs-for-go-with-pprof/</id>
    <content type="html" xml:base="/posts/flame-graphs-for-go-with-pprof/">
      <![CDATA[
        <p>This week, I was working on a Go program and I wanted to understand which part
was taking the most time. I had seen some people use flame graphs for this, but
had never made one myself, so I decided to try it out. It took a little time to
figure out the right tools to use, but once I did it was pretty easy. Here’s
what one looks like (<a href="https://static.benburwell.com/blog/flame.svg">full size</a>):</p>
<p><object style="max-width:100%" data="https://static.benburwell.com/blog/flame.svg"></object></p>
<p>On the X axis, a wider bar means more time spent, and on the Y axis you can see
the call stack (functions lower down are calling functions higher up). There’s
no particular meaning to the colors.</p>
<aside>
  If you’re just here for the commands, <a href="#tldr-how-to-make-a-flame-graph-from-a-pprof-source">skip to the end</a>!
</aside>
<p>Unfortunately I can’t share the <em>actual</em> program I was working on, but I’ll show
you the steps on a little program called <code>whoami</code> which starts a web server that
responds to each request by simply writing back the IP address of the requester:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="cl"><span class="kn">package</span> <span class="nx">main</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="kn">import</span> <span class="p">(</span>
</span></span><span class="line"><span class="cl">  <span class="s">&#34;log&#34;</span>
</span></span><span class="line"><span class="cl">  <span class="s">&#34;net/http&#34;</span>
</span></span><span class="line"><span class="cl">  <span class="nx">_</span> <span class="s">&#34;net/http/pprof&#34;</span>
</span></span><span class="line"><span class="cl"><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="kd">func</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">  <span class="nx">http</span><span class="p">.</span><span class="nf">HandleFunc</span><span class="p">(</span><span class="s">&#34;/&#34;</span><span class="p">,</span> <span class="nx">handle</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">  <span class="nx">log</span><span class="p">.</span><span class="nf">Fatal</span><span class="p">(</span><span class="nx">http</span><span class="p">.</span><span class="nf">ListenAndServe</span><span class="p">(</span><span class="s">&#34;:8080&#34;</span><span class="p">,</span> <span class="kc">nil</span><span class="p">))</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="kd">func</span> <span class="nf">handle</span><span class="p">(</span><span class="nx">w</span> <span class="nx">http</span><span class="p">.</span><span class="nx">ResponseWriter</span><span class="p">,</span> <span class="nx">r</span> <span class="o">*</span><span class="nx">http</span><span class="p">.</span><span class="nx">Request</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">  <span class="nx">log</span><span class="p">.</span><span class="nf">Printf</span><span class="p">(</span><span class="s">&#34;handling request from: %s&#34;</span><span class="p">,</span> <span class="nx">r</span><span class="p">.</span><span class="nx">RemoteAddr</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">  <span class="k">if</span> <span class="nx">_</span><span class="p">,</span> <span class="nx">err</span> <span class="o">:=</span> <span class="nx">w</span><span class="p">.</span><span class="nf">Write</span><span class="p">([]</span><span class="nb">byte</span><span class="p">(</span><span class="nx">r</span><span class="p">.</span><span class="nx">RemoteAddr</span><span class="p">));</span> <span class="nx">err</span> <span class="o">!=</span> <span class="kc">nil</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">    <span class="nx">log</span><span class="p">.</span><span class="nf">Printf</span><span class="p">(</span><span class="s">&#34;could not write IP: %s&#34;</span><span class="p">,</span> <span class="nx">err</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">  <span class="p">}</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span>
</span></span></code></pre></div><p>Go’s standard library includes some tools for profiling the running program
through its various <code>pprof</code> packages and utilities. Here, I’m importing
<a href="https://pkg.go.dev/net/http/pprof"><code>net/http/pprof</code></a>, which exposes <code>/debug/pprof</code> endpoints
on the <a href="https://pkg.go.dev/net/http#DefaultServeMux"><code>DefaultServeMux</code></a>.</p>
<h2 id="profiling-our-web-server">Profiling our web server</h2>
<p>I want to make sure <code>whoami</code> is actually serving requests when I profile it, so
I’m using <a href="https://github.com/wg/wrk">wrk</a> to generate lots of requests by running <code>wrk -d 30s 'http://localhost:8080'</code>. With that running, I can fetch a 20 second CPU profile
from the pprof server:</p>
<pre tabindex="0"><code>$ go tool pprof \
  -raw -output=cpu.txt \
  &#39;http://localhost:8080/debug/pprof/profile?seconds=20&#39;
</code></pre><p>This creates a file called <code>cpu.txt</code> containing the decoded pprof samples, which
are what I need to build my flame graph.</p>
<p>Brendan Gregg, <a href="https://www.brendangregg.com/flamegraphs.html">the inventor of flame graphs</a>, has published some
scripts to turn pprof output into an interactive flame graph. Since <code>whoami</code> is
a Go program, I can use <code>stackcollapse-go.pl</code> to convert the samples to the
right format for <code>flamegraph.pl</code> (both from <a href="https://github.com/brendangregg/FlameGraph">this repository</a>):</p>
<pre tabindex="0"><code>$ ./stackcollapse-go.pl cpu.txt | flamegraph.pl &gt; flame.svg
</code></pre><p><a href="https://static.benburwell.com/blog/flame.svg">Click here to see the result!</a></p>
<h2 id="make-it-faster">Make it faster!</h2>
<p>One thing I noticed though is that about 30% of the time it takes to <code>serve()</code>
each request seems to be spent in <code>log.Printf</code>, which needs to make a <code>write</code>
system call to print the message to the terminal:</p>
<p><img src="https://static.benburwell.com/blog/flame_log.webp" alt="Annotated flame graph showing log.Printf about a third as wide as serve"></p>
<p>Maybe we can make our server faster by removing the logging? But to know if we
can make it “faster,” we need to know how fast it is right now.</p>
<p>One interesting thing about flame graphs is that they don’t measure time in
seconds, they measure it in <em>samples</em>. When you run pprof, it checks what your
program is doing <a href="https://cs.opensource.google/go/go/+/refs/tags/go1.17.8:src/runtime/pprof/pprof.go;l=762-771">100 times per second</a>, so the flame graph is
just an aggregation of “how many times was each unique stack trace sampled.”
(This means that if you have a function that’s called rarely, it might not even
appear in the flame graph at all!)</p>
<p>So to tell how fast <code>whoami</code> is in absolute terms, we can use <code>wrk</code> to gather
some initial statistics. I want to run <code>wrk</code> again (rather than use the results
from when I was profiling) because profiling your program will slow it down.</p>
<pre tabindex="0"><code>$ wrk -d 10s &#39;http://localhost:8080&#39;
Running 10s test @ http://localhost:8080
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   152.41us   51.27us   2.81ms   86.24%
    Req/Sec    31.30k     2.27k   39.31k    72.77%
  628837 requests in 10.10s, 76.76MB read
Requests/sec:  62261.70
Transfer/sec:      7.60MB
</code></pre><p>When I remove the call to <code>log.Printf</code> and re-run the server, <code>wrk</code> now reports:</p>
<pre tabindex="0"><code>$ wrk -d 10s &#39;http://localhost:8080&#39;
Running 10s test @ http://localhost:8080
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   117.23us   30.28us   1.02ms   78.11%
    Req/Sec    40.28k     3.09k   47.51k    69.80%
  809605 requests in 10.10s, 98.83MB read
Requests/sec:  80160.78
Transfer/sec:      9.79MB
</code></pre><p>Sure enough, it looks like the average latency decreased by around 30% (from 152
to 117 µs), and the requests per second correspondingly increased from 62k to
80k, around 30%!</p>
<p>I don’t have super high confidence in this measurement though.  I’m doing this
all on my laptop, so I’m not sure if the system calls that <code>wrk</code> is making to
send the requests are slowing down the system calls <code>whoami</code> is making to read
the requests and write the responses at all.</p>
<h2 id="tldr-how-to-make-a-flame-graph-from-a-pprof-source">tl;dr: How to Make a Flame Graph from a <code>pprof</code> source</h2>
<p>Download the scripts from <a href="https://github.com/brendangregg/FlameGraph">Brendan Gregg’s <code>FlameGraph</code> repo</a> and
then assuming <code>&lt;source&gt;</code> is either a pprof file or URL, run these commands:</p>
<pre tabindex="0"><code>$ go tool pprof -raw -output=cpu.txt &lt;source&gt;
$ stackcollapse-go.pl cpu.txt | flamegraph.pl &gt; cpu.svg
</code></pre><p>You can also use pprof's web UI to do this without needing any external scripts:</p>
<pre tabindex="0"><code>$ go tool pprof -http=: &lt;source&gt;
</code></pre><p>Then navigate to View &gt; Flame Graph in the pprof UI that opens in your browser.</p>

      ]]>
    </content>
    <author>
      <name>Ben Burwell</name>
    </author>
    <summary type="html">
      This week, I was working on a Go program and I wanted to understand which part was taking the most time. I had seen some people use flame graphs for this, but had never made one myself, so I decided to try it out. It took a little time to figure out the right tools to use, but once I did it was pretty easy. Here’s what one looks like (full size):
    </summary>
  </entry>
  
  <entry>
    <title type="html">Contributing to the aerc email client</title>
    <link href="https://www.benburwell.com/posts/aerc/" rel="alternate" type="text/html" title="Contributing to the aerc email client" />
    <published>2022-03-05T00:00:00Z</published>
    <updated>2022-03-05T00:00:00Z</updated>
    <id>https://www.benburwell.com/posts/aerc/</id>
    <content type="html" xml:base="/posts/aerc/">
      <![CDATA[
        <p><a href="https://aerc-mail.org">Aerc</a> is an open-source email client that runs in your terminal. During 2019
and early 2020, I contributed 32 patches to the project. Many of them were minor
enhancements, bug fixes, and documentation updates, but I also contributed a
number of more substantial features as well.</p>
<p>While I no longer use aerc regularly, I found it really rewarding to work with
the other contributors and to interact with the community of people who were
using software I helped write.</p>
<h2 id="maildir-backend">Maildir backend</h2>
<p>When I first encountered aerc, it only supported browsing via IMAP. At the time,
however, all of my email was synced to my machine and stored in a Maildir
(essentially, a specific directory structure where each file represents an email
message). Because I wanted to use aerc, and I didn’t want to lose my offline
mail reading capabilities, I decided to add support for Maildir backends in
addition to the existing IMAP.</p>
<p>While the maintainer had already planned to support multiple backends, including
Maildir, only the IMAP backend had been implemented. This meant that there were
a few places where the UI layer was tightly coupled with IMAP-specific
functionality, so my first task was to extract slightly more generic models that
could be used by various backends.</p>
<p>For example, many IMAP operations specify messages by their UID, which is a
server-assigned unique identifier. The UI layer had initially been implemented
in a way that used UIDs to distinguish between messages. However, in the context
of a Maildir, messages don’t have server-assigned UIDs, but the UI layer still
needs some way to keep track of which message(s) are being selected in the
browser.</p>
<p>After a few revisions, <a href="https://lists.sr.ht/~sircmpwn/aerc/%3C20190711134454.80318-1-ben%40benburwell.com%3E">my patch set for Maildir support was applied</a>.</p>
<h2 id="unsubscribe-command"><code>:unsubscribe</code> command</h2>
<p>Aerc is somewhat vim-like in its keyboard interface. Users type commands into a
command line (or use keybindings to enter common commands more quickly), and
then press the Return/Enter key to perform the action. Some of the common
commands include things like <code>:reply</code>, <code>:send</code>, <code>:attach</code>, etc.</p>
<p>At this point, I think most people are probably familiar with “Unsubscribe”
links that are included in messages from mailing lists. Astute users of certain
mail readers like Gmail or Apple Mail may have even noticed that the mail
application itself sometimes gives you a convenient “Unsubscribe” <em>button</em>,
outside of the email display itself:</p>
<p class="flex flex-wrap justify-content-center">
  <img class="p-1" alt="An unsubscribe button in the iOS Mail app" src="https://static.benburwell.com/blog/unsubscribe-ios.webp">
  <img class="p-1" alt="An unsubscribe link in Gmail" src="https://static.benburwell.com/blog/unsubscribe-gmail.webp">
</p>
<p>I had always assumed that these were being parsed out of the message body<sup id="fnref:1"><a href="#fn:1" class="footnote-ref" role="doc-noteref">1</a></sup>,
but it turns out that <a href="https://datatracker.ietf.org/doc/html/rfc2369">RFC 2369</a> defines a <code>List-Unsubscribe</code> mail header that
senders can add to their messages to provide a convenient way for people to
unsubscribe.</p>
<p>The <code>List-Unsubscribe</code> header can contain one or multiple URLs. A <code>mailto:</code> URL
can be used to specify an unsubscribe address, or a HTTP URL can specify a web
page to visit in order to unsubscribe. I sent a patch to <a href="https://git.sr.ht/~rjarry/aerc/commit/030f39043628f01b174ebb11595a4e74da95f0b3">add an <code>:unsubscribe</code>
command</a> to aerc.</p>
<h2 id="address-book-integration">Address book integration</h2>
<p>The third major feature I contributed was support for integrating an address
book or contact list. Because many of aerc’s users tend to be familiar with or
already use this sort of terminal-oriented utility, I took a lot of inspiration
from the way mutt (another command line email client) handles contact
integration.</p>
<p>I <a href="https://lists.sr.ht/~sircmpwn/aerc/patches/9354">added an <code>address-book-cmd</code> configuration option</a> that enabled
users to configure an external command which aerc could run to fetch address
completions. This command was expected to print out email addresses from the
user's address book, and aerc would present the options in its tab completion
system.</p>
<p>For my own personal use of this feature, I also created <a href="https://git.sr.ht/~benburwell/cdc">a janky little
program</a> that would print out completions for use in aerc by hitting a
CardDAV endpoint.</p>
<div class="footnotes" role="doc-endnotes">
<hr>
<ol>
<li id="fn:1">
<p>And maybe in some cases, they actually are — I don’t have any visibility
into how these email clients are implemented!&#160;<a href="#fnref:1" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p>
</li>
</ol>
</div>

      ]]>
    </content>
    <author>
      <name>Ben Burwell</name>
    </author>
    <summary type="html">
      Aerc is an open-source email client that runs in your terminal. During 2019 and early 2020, I contributed 32 patches to the project. Many of them were minor enhancements, bug fixes, and documentation updates, but I also contributed a number of more substantial features as well.
While I no longer use aerc regularly, I found it really rewarding to work with the other contributors and to interact with the community of people who were using software I helped write.
    </summary>
  </entry>
  
  <entry>
    <title type="html">Lutron Universal Wireshark</title>
    <link href="https://www.benburwell.com/posts/lutron-universal-wireshark/" rel="alternate" type="text/html" title="Lutron Universal Wireshark" />
    <published>2022-02-27T00:00:00Z</published>
    <updated>2022-02-27T00:00:00Z</updated>
    <id>https://www.benburwell.com/posts/lutron-universal-wireshark/</id>
    <content type="html" xml:base="/posts/lutron-universal-wireshark/">
      <![CDATA[
        <p>One of my all-time favorite tools is <a href="https://www.wireshark.org">Wireshark</a>. During college, my summer
internship at <a href="http://www.lutron.com/">Lutron Electronics</a> was focused on packaging a custom
internal build of Wireshark, complete with new dissectors for Lutron’s
proprietary network protocols.</p>
<p>Lutron’s lighting control hardware communicates using a variety of proprietary
wired and wireless link protocols. My goal was to make it quicker and easier for
R&amp;D engineers and field technicians to debug and verify hardware by enabling
them to capture and dissect these proprietary protocols using Wireshark.</p>
<p>There had been prior efforts to build Lutron protocol dissectors into Wireshark,
but there had been a few challenges:</p>
<ul>
<li>The customized builds of Wireshark would become outdated when new commands
were added to the Lutron protocols, and when new version of Wireshark were
released.</li>
<li>Different teams had built dissectors for their specific protocols, meaning
that there wasn’t a single version of Wireshark which could capture and
dissect any Lutron protocol.</li>
<li>Capturing was limited to Ethernet-based protocols, and was not available for
serial data.</li>
</ul>
<p>After meeting with stakeholders across the company to gain a better
understanding of the problem, I went to work trying to resolve the issues that
people were facing.</p>
<p>First, I created a Jenkins pipeline on an existing CI server so that when a new
release of Wireshark was published, we could simply run the pipeline to compile
and package a new installer, and publish it to an internal network drive.</p>
<p>Next, I looked at the dissector code other teams had written and worked to
integrate them into the CI/CD pipeline. However, this didn’t completely solve
the problem of new commands being added and not being reflected in Wireshark.
To make this easier, I wrote a script that would parse specially-formatted
comments out of a C header file and generate appropriate Wireshark dissector
code.</p>
<p>Finally, to address the need to view serial data in Wireshark, I wrote a
program to capture data from a USB serial interface and output it in pcap
format, and wrote a small wrapper script in Lua to <a href="https://www.wireshark.org/docs/wsdg_html_chunked/wslua_menu_example.html">expose it as a Wireshark
plugin</a>.</p>
<p>I wrapped up the summer by presenting sessions about how to use Wireshark to
teams across the company, including product development, QA, and field service.</p>
<p>While I spent a good chunk of time working on Wireshark, most of my work
unfortunately could not be contributed upstream due to its proprietary nature. I
<em>was</em> able to contribute <a href="https://gitlab.com/wireshark/wireshark/-/commit/830d1b1ce9905e287386c9e8bc638c26380d77cb">a minor patch</a> to resolve a quoting issues in a
build script.</p>

      ]]>
    </content>
    <author>
      <name>Ben Burwell</name>
    </author>
    <summary type="html">
      One of my all-time favorite tools is Wireshark. During college, my summer internship at Lutron Electronics was focused on packaging a custom internal build of Wireshark, complete with new dissectors for Lutron’s proprietary network protocols.
Lutron’s lighting control hardware communicates using a variety of proprietary wired and wireless link protocols. My goal was to make it quicker and easier for R&amp;D engineers and field technicians to debug and verify hardware by enabling them to capture and dissect these proprietary protocols using Wireshark.
    </summary>
  </entry>
  
  <entry>
    <title type="html">Intercepting Go TLS Connections with Wireshark</title>
    <link href="https://www.benburwell.com/posts/intercepting-golang-tls-with-wireshark/" rel="alternate" type="text/html" title="Intercepting Go TLS Connections with Wireshark" />
    <published>2021-05-14T00:00:00Z</published>
    <updated>2021-05-14T00:00:00Z</updated>
    <id>https://www.benburwell.com/posts/intercepting-golang-tls-with-wireshark/</id>
    <content type="html" xml:base="/posts/intercepting-golang-tls-with-wireshark/">
      <![CDATA[
        <p>I wrote previously about how I like to <a href="https://www.benburwell.com/posts/debugging-http-services-with-mitmproxy/">use mitmproxy for debugging HTTP
services</a>. This is a continued exploration of debugging network
services, in particular focused around inspecting TLS encrypted traffic that
your application is sending and receiving.</p>
<p>Transport Layer Security is a fundamental building block of modern secure
communications on the Internet, and increasingly the software we write is
expected to be a fluent speaker of TLS. While this brings security benefits for
users, it also increases the complexity of understanding what our software is
doing because when we try to use tools like Wireshark or tcpdump to inspect
network traffic, all we see is encrypted data. Let’s see what a regular HTTP
request looks like in Wireshark:</p>
<pre tabindex="0"><code>$ curl http://www.benburwell.com
</code></pre><p><img src="https://static.benburwell.com/blog/tls-wireshark/plain.png" alt="A Wireshark packet capture showing a plain HTTP request and
response"></p>
<p>Here, we can see the HTTP request and response. But what happens when we make
the request over TLS?</p>
<pre tabindex="0"><code>$ curl https://www.benburwell.com
</code></pre><p><img src="https://static.benburwell.com/blog/tls-wireshark/tls.png" alt="A Wireshark packet capture showing encrypted TLS application
data"></p>
<p>Here all we see are some TLS packets with embedded “encrypted application data.”
We can see that a connection is being made, but we can”t inspect the raw HTTP
request or response as we’d like to.</p>
<p>But all is not lost! There is a way for Wireshark to decrypt TLS connections and
show you dissected application protocol packets, it just requires a little
configuration. To understand how this works, we first need to understand a
little bit about TLS.</p>
<h2 id="how-decrypting-tls-in-wireshark-works">How decrypting TLS in Wireshark works</h2>
<p>TLS encrypts data within a session using a “master secret,” a symmetric
encryption key that is established by using a key exchange protocol. So in order
for Wireshark to be able to decrypt and dissect TLS packets, we need some way to
tell it the master secret for the session.</p>
<p>The master secret is agreed upon using a cryptographic protocol when the TLS
connection is established. The exact implementation varies, but in general the
client and the server use some clever math to derive a value that is known at
both ends and yet is never directly sent over the wire, such that it is
computationally expensive for intermediate observers to derive the secret for
themselves. We won’t get into <a href="https://en.wikipedia.org/wiki/Transport_Layer_Security#Key_exchange_or_key_agreement">the specifics</a>, but one important detail for
later is that this exchange involves the client sending the server a large
random number in plain text, before the encrypted stream begins.</p>
<p>Conveniently, many TLS client libraries support the use of a key log file, which
does pretty much exactly what it sounds like: when the <code>SSLKEYLOGFILE</code>
environment variable is set, the library writes the key needed to decrypt the
traffic each time it establishes a TLS connection. Originally, this was
implemented in Mozilla’s (at the time Netscape’s) Network Security Services
library, so you might also see it referred to as a “<a href="https://developer.mozilla.org/en-US/docs/Mozilla/Projects/NSS/Key_Log_Format">NSS Key Log File</a>.”
Let’s give this a try!</p>
<pre tabindex="0"><code>$ SSLKEYLOGFILE=/tmp/keys curl -s https://www.benburwell.com &gt;/dev/null
$ cat /tmp/keys
CLIENT_RANDOM 40b1a54e6b38f7accb90e1f5162534b8628389f4257e39f614a3ca28514db2c7 3121d2812c459996b072165c2ece4a1c85687d7073de06be0e1c16bf4a862fbe26a8cba24db1a4a0a9684fb19ad52f97
</code></pre><p>(Note that <code>SSLKEYLOGFILE</code> support was only enabled by default in curl 7.58, so
if this isn’t working for you, check which version of curl you have).</p>
<p>This line in the key log means that for the TLS connection that was initiated
with the <code>CLIENT_RANDOM</code> of <code>40b1...</code>, the master secret is <code>3121...</code>. So now we
just need to tell Wireshark about this. Let’s start a new capture and make
another request:</p>
<p><img src="https://static.benburwell.com/blog/tls-wireshark/tls.png" alt="A Wireshark packet capture showing encrypted TLS application
data"></p>
<p>Now, we can right-click on the “Transport Layer Security” layer and select
Protocol Preferences -&gt; (Pre)-Master-Secret log filename... and enter the path
to our <code>SSLKEYLOGFILE</code>, <code>/tmp/keys</code>, and something magical happens:</p>
<p><img src="https://static.benburwell.com/blog/tls-wireshark/decrypted.png" alt="A Wireshark packet capture showing TLS protocol packets and decrypted HTTP
traffic"></p>
<p>Now, when Wireshark encounters a TLS handshake, it can extract the random value
sent by the client and consult the key log file to discover a matching
<code>CLIENT_RANDOM</code> line and use the corresponding session master secret to decrypt
the data sent over the connection. So in addition to seeing the TLS details as
before, we can also see the decrypted HTTP requests!</p>
<h2 id="configuring-go-to-use-a-tls-key-file">Configuring Go to use a TLS Key File</h2>
<p>Go doesn’t support the <code>SSLKEYLOGFILE</code> environment variable directly, but it
does have a different mechanism to achieve the same result. The
<code>crypto/tls.Config</code> struct has a <code>KeyLogWriter</code> field:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="cl"><span class="c1">// KeyLogWriter optionally specifies a destination for TLS master secrets
</span></span></span><span class="line"><span class="cl"><span class="c1">// in NSS key log format that can be used to allow external programs
</span></span></span><span class="line"><span class="cl"><span class="c1">// such as Wireshark to decrypt TLS connections.
</span></span></span><span class="line"><span class="cl"><span class="c1"></span><span class="nx">KeyLogWriter</span> <span class="nx">io</span><span class="p">.</span><span class="nx">Writer</span>
</span></span></code></pre></div><p>In typical Go fashion, I/O has been abstracted to an <code>io.Writer</code> interface.
Since we can use an <code>*os.File</code> to satisfy this interface, all we need to do to
produce a file containing the TLS secrets is to open a file and pass that
through the <code>tls.Config.KeyLogWriter</code>:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="cl"><span class="kn">package</span> <span class="nx">main</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="kn">import</span> <span class="p">(</span>
</span></span><span class="line"><span class="cl">  <span class="s">&#34;crypto/tls&#34;</span>
</span></span><span class="line"><span class="cl">  <span class="s">&#34;net/http&#34;</span>
</span></span><span class="line"><span class="cl">  <span class="s">&#34;os&#34;</span>
</span></span><span class="line"><span class="cl"><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="kd">func</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">  <span class="nx">f</span><span class="p">,</span> <span class="nx">err</span> <span class="o">:=</span> <span class="nx">os</span><span class="p">.</span><span class="nf">OpenFile</span><span class="p">(</span><span class="s">&#34;/tmp/keys&#34;</span><span class="p">,</span> <span class="nx">os</span><span class="p">.</span><span class="nx">O_APPEND</span><span class="p">|</span><span class="nx">os</span><span class="p">.</span><span class="nx">O_CREATE</span><span class="p">|</span><span class="nx">os</span><span class="p">.</span><span class="nx">O_WRONLY</span><span class="p">,</span> <span class="mo">0600</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">  <span class="k">if</span> <span class="nx">err</span> <span class="o">!=</span> <span class="kc">nil</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">    <span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">  <span class="p">}</span>
</span></span><span class="line"><span class="cl">  <span class="k">defer</span> <span class="nx">f</span><span class="p">.</span><span class="nf">Close</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">  <span class="nx">client</span> <span class="o">:=</span> <span class="o">&amp;</span><span class="nx">http</span><span class="p">.</span><span class="nx">Client</span><span class="p">{</span>
</span></span><span class="line"><span class="cl">    <span class="nx">Transport</span><span class="p">:</span> <span class="o">&amp;</span><span class="nx">http</span><span class="p">.</span><span class="nx">Transport</span><span class="p">{</span>
</span></span><span class="line"><span class="cl">      <span class="nx">TLSClientConfig</span><span class="p">:</span> <span class="o">&amp;</span><span class="nx">tls</span><span class="p">.</span><span class="nx">Config</span><span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="nx">KeyLogWriter</span><span class="p">:</span> <span class="nx">f</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">      <span class="p">},</span>
</span></span><span class="line"><span class="cl">    <span class="p">},</span>
</span></span><span class="line"><span class="cl">  <span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">  <span class="nx">client</span><span class="p">.</span><span class="nf">Get</span><span class="p">(</span><span class="s">&#34;https://www.benburwell.com&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span>
</span></span></code></pre></div><p>Building and running our program results in <code>CLIENT_RANDOM</code> lines being appended
to our <code>/tmp/keys</code> file and picked up by Wireshark, which in turn is able to
decrypt the messages being sent by our program:</p>
<p><img src="https://static.benburwell.com/blog/tls-wireshark/go_decrypted.png" alt="A Wireshark packet capture showing decrypted TLS traffic sent by
Go"></p>
<p>In practice, for decrypting HTTP traffic for debugging, I find
<a href="https://www.benburwell.com/posts/debugging-http-services-with-mitmproxy/">mitmproxy</a> to be faster and easier, since it doesn’t require
changes to the program. However, sometimes it’s preferable to look at the actual
bytes on the wire, which is where using a key log file with Wireshark might be a
better approach.</p>
<p>Additionally, there are plenty of protocols other than HTTP that use TLS
connections, and where proxying isn’t an option. For example, I’ve used a key
log file with Wireshark to debug a Go program that was making an IMAP connection
to a mail server. Because of the way Go’s libraries tend to be layered, the code
to do this was very similar to the HTTP example above; I just needed to use my
custom <code>tls.Config</code> when constructing an IMAP client instead of a HTTP client.</p>

      ]]>
    </content>
    <author>
      <name>Ben Burwell</name>
    </author>
    <summary type="html">
      I wrote previously about how I like to use mitmproxy for debugging HTTP services. This is a continued exploration of debugging network services, in particular focused around inspecting TLS encrypted traffic that your application is sending and receiving.
Transport Layer Security is a fundamental building block of modern secure communications on the Internet, and increasingly the software we write is expected to be a fluent speaker of TLS. While this brings security benefits for users, it also increases the complexity of understanding what our software is doing because when we try to use tools like Wireshark or tcpdump to inspect network traffic, all we see is encrypted data.
    </summary>
  </entry>
  
  <entry>
    <title type="html">Debugging HTTP services with mitmproxy</title>
    <link href="https://www.benburwell.com/posts/debugging-http-services-with-mitmproxy/" rel="alternate" type="text/html" title="Debugging HTTP services with mitmproxy" />
    <published>2021-05-06T00:00:00Z</published>
    <updated>2021-05-06T00:00:00Z</updated>
    <id>https://www.benburwell.com/posts/debugging-http-services-with-mitmproxy/</id>
    <content type="html" xml:base="/posts/debugging-http-services-with-mitmproxy/">
      <![CDATA[
        <p>I spend a lot of my time at work writing Go services that talk to other Go
services over HTTP. Much of the time, everything works as expected, but every
now and then a situation arises where I’m struggling to understand why my
program is receiving a specific value. Is my request not being built correctly?
Am I not properly deserializing the response? Logging can be helpful, but
sometimes I really just want to look at the HTTP traffic between services.</p>
<p>One tool that I really love in these situations is <a href="https://mitmproxy.org/">mitmproxy</a>: “a free and
open source interactive HTTPS proxy,” according to its website.</p>
<p>There are no shortage of features and options for mitmproxy, and when I was
first exploring it they were a little daunting. I’m sure there are still tons of
things that it can do that I don’t know about, but the main thing I tend to use
it for is reverse proxying.</p>
<p>Reverse proxying is a pretty simple concept; basically it means that you send
your HTTP requests to a specific proxy endpoint, and the proxy repeats your
request to some specific origin server. Then the same thing just happens
backwards: the origin server sends its reply to the reverse proxy which passes
it along back to you. This means that the proxy can see (and log!) the actual
HTTP request you send, and the response sent by the origin server.</p>
<p>Earlier today, I was working on a program that sent requests to a HTTP server
and my program’s output didn’t make sense. I wasn’t sure if my requests were
being sent incorrectly, or maybe there was a bug in the server I was talking to.
So I fired up mitmproxy to take a look. In my shell, I ran:</p>
<pre tabindex="0"><code>mitmproxy --mode reverse:https://service.dev.example.com --listen-port 8080
</code></pre><p>This opens up a log window where any requests handled by the reverse proxy will
be displayed. I quickly updated my program to make its requests to
<code>http://localhost:8080</code> instead of <code>https://service.dev.example.com</code> and re-ran
it. The request and response were logged in the terminal window, and I was
quickly able to identify that a particular dependency needed to be updated.</p>
<p>Of course, you could also use Wireshark or tcpdump to inspect network traffic,
and these are great options that I also use frequently! But the main reason I
tend to turn first to mitmproxy is because it handles TLS like a <a href="https://youtu.be/4r7wHMg5Yjg">honey
badger</a> -- it just doesn’t give a shit. Basically, you can throw whatever you
want at it and it’ll just do the right thing:</p>
<pre tabindex="0"><code>Client  --TLS--&gt;   mitmproxy  --TLS--&gt;   Origin
Client  --HTTP--&gt;  mitmproxy  --TLS--&gt;   Origin
Client  --TLS--&gt;   mitmproxy  --HTTP--&gt;  Origin
Client  --HTTP--&gt;  mitmproxy  --HTTP--&gt;  Origin
</code></pre><p>How does this work? When you first run mitmproxy, it generates a certificate
authority that it uses to generate certificates on-the-fly. All you need to do
is add the CA certificate to your OS trust store (see their <a href="https://docs.mitmproxy.org/stable/concepts-certificates/">docs about
certificates here</a>). For example, if I run <code>mitmproxy --mode reverse:https://www.benburwell.com --listen-port 8080</code>, and then connect over
SSL, I can see the certificate that mitmproxy generated:</p>
<pre tabindex="0"><code>$ openssl s_client -connect localhost:8080
CONNECTED(00000005)
depth=1 CN = mitmproxy, O = mitmproxy
verify error:num=19:self signed certificate in certificate chain
verify return:0
---
Certificate chain
 0 s:/CN=www.benburwell.com
   i:/CN=mitmproxy/O=mitmproxy
 1 s:/CN=mitmproxy/O=mitmproxy
   i:/CN=mitmproxy/O=mitmproxy
</code></pre><p>Here, <code>mitmproxy</code> CA has generated a certificate with <code>CN=www.benburwell.com</code> to
match the hostname I’m reverse proxying to!</p>
<p>Now, there are ways to snoop on TLS encrypted traffic with Wireshark as well
using a TLS key log file, but this usually involves making somewhat non-trivial
modifications to the program you’re working with. It’s not <em>very</em> complicated or
difficult, and it’s a technique I’ve used a few times, but mitmproxy is usually
quicker and easier for me. I plan to write a post about this topic in the
future, so stay tuned! <em>(Update: see <a href="../intercepting-golang-tls-with-wireshark/">my post about decrypting TLS in
Wireshark</a>)</em></p>

      ]]>
    </content>
    <author>
      <name>Ben Burwell</name>
    </author>
    <summary type="html">
      I spend a lot of my time at work writing Go services that talk to other Go services over HTTP. Much of the time, everything works as expected, but every now and then a situation arises where I’m struggling to understand why my program is receiving a specific value. Is my request not being built correctly? Am I not properly deserializing the response? Logging can be helpful, but sometimes I really just want to look at the HTTP traffic between services.
    </summary>
  </entry>
  
  <entry>
    <title type="html">Learning About Syscall Filtering With Seccomp</title>
    <link href="https://www.benburwell.com/posts/learning-about-syscall-filtering-with-seccomp/" rel="alternate" type="text/html" title="Learning About Syscall Filtering With Seccomp" />
    <published>2020-06-27T00:00:00Z</published>
    <updated>2022-04-20T00:00:00Z</updated>
    <id>https://www.benburwell.com/posts/learning-about-syscall-filtering-with-seccomp/</id>
    <content type="html" xml:base="/posts/learning-about-syscall-filtering-with-seccomp/">
      <![CDATA[
        <p>I’d heard about being able to <a href="https://docs.docker.com/engine/security/seccomp/">run Docker containers with a custom security
profile</a>, but wasn’t really sure what that meant or what was
happening behind the scenes, so I decided to do some experimentation to find
out.</p>
<p>It turns out that the Linux kernel includes a feature called “secure computing
mode,” or <code>seccomp</code> for short. Using <code>seccomp</code> lets you tell the kernel that you
only expect your program to use a specific set of system calls, and if your
program makes any system calls that aren’t in your approved list, the kernel
should kill your program.</p>
<p>But why would you want to do this? I think if you had a pretty simple program,
using <code>seccomp</code> might be overkill. But if your program makes different system
calls depending on possibly-untrustworthy user input, it might make sense to try
to limit what the program is allowed to do. Looking at <a href="https://en.wikipedia.org/wiki/Seccomp#Software_using_seccomp_or_seccomp-bpf">a list of software using
<code>seccomp</code> on Wikipedia</a> backs this up: the software listed are mostly
hypervisors/container runners (like Docker), web browsers, etc.</p>
<p>By reading <a href="https://man7.org/linux/man-pages/man2/seccomp.2.html">the manual page for the <code>seccomp(2)</code> system call</a>, we
can learn how to write a program to try this out. The simplest action is to
enter “strict mode,” which prevents all system calls except for <code>read(2)</code>,
<code>write(2)</code>, <code>_exit(2)</code>, and <code>sigreturn(2)</code> --- in other words, what I think
should be just enough to write hello world! Let’s give it a shot:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-c" data-lang="c"><span class="line"><span class="cl"><span class="cp">#include</span> <span class="cpf">&lt;linux/seccomp.h&gt;</span><span class="cp">
</span></span></span><span class="line"><span class="cl"><span class="cp">#include</span> <span class="cpf">&lt;sys/prctl.h&gt;</span><span class="cp">
</span></span></span><span class="line"><span class="cl"><span class="cp">#include</span> <span class="cpf">&lt;stdio.h&gt;</span><span class="cp">
</span></span></span><span class="line"><span class="cl"><span class="cp"></span>
</span></span><span class="line"><span class="cl"><span class="kt">int</span>
</span></span><span class="line"><span class="cl"><span class="nf">main</span><span class="p">()</span>
</span></span><span class="line"><span class="cl"><span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="k">if</span> <span class="p">(</span><span class="nf">prctl</span><span class="p">(</span><span class="n">PR_SET_SECCOMP</span><span class="p">,</span> <span class="n">SECCOMP_MODE_STRICT</span><span class="p">)</span> <span class="o">!=</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">                <span class="nf">perror</span><span class="p">(</span><span class="s">&#34;prctl&#34;</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">                <span class="k">return</span> <span class="mi">1</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="cl">        <span class="nf">printf</span><span class="p">(</span><span class="s">&#34;hello, world!</span><span class="se">\n</span><span class="s">&#34;</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span>
</span></span></code></pre></div><p>When I compile and run my program, I just see <strong>Killed</strong> being printed, not
<strong>hello, world!</strong>. Well, this is pretty good evidence that <code>seccomp</code> is doing
<em>something</em> --- it’s at least killing my program! Let’s try to find out why it’s
being killed using <a href="https://strace.io/"><code>strace</code>, a program that shows you all of the system calls
being made</a>:</p>
<pre tabindex="0"><code>$ strace ./hello
execve(&#34;./hello&#34;, [&#34;./hello&#34;], 0x7fff77b754b0 /* 20 vars */) = 0
brk(NULL)                               = 0x559e08463000
access(&#34;/etc/ld.so.nohwcap&#34;, F_OK)      = -1 ENOENT (No such file or directory)
access(&#34;/etc/ld.so.preload&#34;, R_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, &#34;/etc/ld.so.cache&#34;, O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=25762, ...}) = 0
mmap(NULL, 25762, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fe65b9f0000
close(3)                                = 0
access(&#34;/etc/ld.so.nohwcap&#34;, F_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, &#34;/lib/x86_64-linux-gnu/libc.so.6&#34;, O_RDONLY|O_CLOEXEC) = 3
read(3, &#34;\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0&gt;\0\1\0\0\0\260\34\2\0\0\0\0\0&#34;...,
832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=2030544, ...}) = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
0x7fe65b9ee000
mmap(NULL, 4131552, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) =
0x7fe65b3df000
mprotect(0x7fe65b5c6000, 2097152, PROT_NONE) = 0
mmap(0x7fe65b7c6000, 24576, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1e7000) = 0x7fe65b7c6000
mmap(0x7fe65b7cc000, 15072, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7fe65b7cc000
close(3)                                = 0
arch_prctl(ARCH_SET_FS, 0x7fe65b9ef4c0) = 0
mprotect(0x7fe65b7c6000, 16384, PROT_READ) = 0
mprotect(0x559e077b9000, 4096, PROT_READ) = 0
mprotect(0x7fe65b9f7000, 4096, PROT_READ) = 0
munmap(0x7fe65b9f0000, 25762)           = 0
prctl(PR_SET_SECCOMP, SECCOMP_MODE_STRICT) = 0
fstat(1,  &lt;unfinished ...&gt;)             = ?
+++ killed by SIGKILL +++
Killed
</code></pre><p>There’s a lot at the beginning about loading dynamically linked libraries,
reading the program binary, and mapping it into memory that I don’t fully
understand. But the last few syscalls provide some clues: right after <code>prctl</code> is
called, we see <code>fstat</code> being called! <code>fstat</code> is a system call for getting the
status of a file, and <code>1</code> happens to be the file descriptor for standard output.
It makes sense that calling <code>printf</code> might involve checking the status of
standard output, so I tried commenting out the call to <code>printf</code> in <code>hello.c</code>.
When I compiled and ran the new version, it still just printed <strong>Killed</strong>, so I
used <code>strace</code> again. Just looking at the last few lines:</p>
<pre tabindex="0"><code>prctl(PR_SET_SECCOMP, SECCOMP_MODE_STRICT) = 0
exit_group(0)                           = ?
+++ killed by SIGKILL +++
Killed
</code></pre><p>Now my program is making the <code>exit_group</code> system call. Thinking back to the
manual page for <code>seccomp</code>, it said:</p>
<blockquote>
<p>The only system calls that the calling thread is permitted to make are
<code>read(2)</code>, <code>write(2)</code>, <code>_exit(2)</code> (but not <code>exit_group(2)</code>), and
<code>sigreturn(2)</code>.</p>
</blockquote>
<p>It looks like I’ll need to actually do some real filtering if I want to run my
hello world program and not just use strict mode. To do this, we need to use
<code>SECCOMP_MODE_FILTER</code> and pass a pointer to a <code>struct sock_fprog</code>, which
according to the manpage is “a Berkeley Packet Filter program designed to filter
arbitrary system calls and system call arguments.“</p>
<p>While we could construct a BPF program using an array of <code>struct sock_filter</code>s,
looking at the chain of instructions we’d need made me think it would be much
easier to enlist the services of <a href="https://github.com/seccomp/libseccomp"><code>libseccomp</code></a>, a library designed
for just this purpose. Let’s try rewriting <code>hello.c</code> to use <code>libseccomp</code> and
allowing those three syscalls we saw before (<code>fstat</code>, <code>write</code>, and
<code>exit_group</code>):</p>
<pre tabindex="0"><code>#include &lt;seccomp.h&gt;
#include &lt;stdio.h&gt;
#include &lt;stdlib.h&gt;

scmp_filter_ctx ctx;

/* graceful_exit cleans up our seccomp context before exiting */
void
graceful_exit(int rc)
{
        seccomp_release(ctx);
        exit(rc);
}

/* setup_seccomp initializes seccomp and loads our BPF program that filters
 * syscalls into the kernel */
void
setup_seccomp()
{
        int rc;

        /* Initialize the seccomp filter state */
        if ((ctx = seccomp_init(SCMP_ACT_KILL)) == NULL) {
                graceful_exit(1);
        }
        if ((rc = seccomp_reset(ctx, SCMP_ACT_KILL)) != 0) {
                graceful_exit(1);
        }

        /* Add allowed system calls to the BPF program */
        if ((rc = seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(fstat), 0)) != 0) {
                graceful_exit(1);
        }
        if ((rc = seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(write), 0)) != 0) {
                graceful_exit(1);
        }
        if ((rc = seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(exit_group), 0)) != 0) {
                graceful_exit(1);
        }

        /* Load the BPF program for the current context into the kernel */
        if ((rc = seccomp_load(ctx)) != 0) {
                graceful_exit(1);
        }
}

int
main()
{
        setup_seccomp();
        printf(&#34;hello, world!\n&#34;);
        graceful_exit(0);
}
</code></pre><p>Since we’re now using <code>libseccomp</code>, we need to tell our C compiler to link the
library:</p>
<pre tabindex="0"><code>$ cc -o hello hello.c -lseccomp
$ ./hello
hello, world!
</code></pre><p>Success! Our program compiles and runs, and all of the necessary syscalls have
been allowed. Now let’s try modifying the <code>main()</code> function of our program to do
something bad, like trying to read the password file <code>/etc/shadow</code>:</p>
<pre tabindex="0"><code>int
main()
{
        FILE *fd;
        setup_seccomp();
        printf(&#34;hello, world!\n&#34;);
        if ((fd = fopen(&#34;/etc/shadow&#34;, &#34;r&#34;)) == NULL) {
                perror(&#34;fopen&#34;);
                graceful_exit(1);
        }
        fclose(fd);
        graceful_exit(0);
}
</code></pre><p>Now when we compile and run our program, we get:</p>
<pre tabindex="0"><code>$ ./hello
hello, world!
Bad system call (core dumped)
</code></pre><p>Nice! The kernel killed our program when we tried to use a system call
(<code>openat</code>) that we didn’t plan on!</p>
<aside>
  I wanted to figure out how to allow <code>openat</code> to only open a
  specific file name, but I couldn’t figure out how to compare string system
  call arguments. Thanks to Isaiah Bell for referring me to
  <a href="https://www.kernel.org/doc/html/v5.17/userspace-api/seccomp_filter.html#introduction">the explanation for why this isn’t possible</a>:
  to prevent <a href="https://cwe.mitre.org/data/definitions/367.html">time-of-check-time-of-use</a> problems.
</aside>
<p>Now let’s go back to how this all fits in to Docker. Looking at <a href="https://github.com/moby/moby/blob/master/profiles/seccomp/default.json">Docker’s
default <code>seccomp</code> profile</a>, a lot of it starts to make more
sense. In fact, it looks like they’re using the exact same names from
<code>libseccomp</code> that we used in our program! If we search <a href="https://github.com/moby/moby/search?q=libseccomp">the moby source code for
<code>libseccomp</code></a>, we can see that it is indeed being used (via Go
bindings).</p>
<p>Let’s try to use a custom <code>seccomp</code> profile to prohibit programs in our Docker
container from listening for network connections. To start, I want to make sure
I can accept network connections, then modify my profile and watch it break. I
downloaded the default <code>seccomp</code> profile to use as a starting point for
tweaking, started a container with port 4000 open, then used <code>nc</code> to try
communicating from my host machine to a listener in the Docker container:</p>
<pre tabindex="0"><code>$ docker run --rm -it -p 4000:4000 --security-opt seccomp=seccomp.json alpine
/ # nc -l -p 4000
</code></pre><p>When I run <code>echo hi | nc 127.0.0.1 4000</code> in a separate terminal, my greeting is
printed by the netcat listener in the Docker container---success! Now that I know
my basic TCP server works, let’s try blocking it with <code>seccomp</code>!  To start
listening on a TCP port, I know that <code>nc</code> has to use the <code>socket</code>, <code>bind</code>, and
<code>listen</code> system calls (which we can verify using <code>strace</code>). I’ll try removing
them from the list of allowed system calls in the default profile, and run the
docker container again with the modified profile:</p>
<pre tabindex="0"><code>$ docker run --rm -it -p 4000:4000 --security-opt seccomp=seccomp.json alpine
/ # nc -l -p 4000
nc: socket(AF_INET,1,0): Operation not permitted
</code></pre><p>Awesome! We just used <code>seccomp</code> to control what our Docker container is allowed
to do!</p>
<p>I can imagine this might be helpful if you had an environment where security was
extremely important and wanted to really lock down your containers, but it’s
hard to imagine that writing custom <code>seccomp</code> profiles for every container in
your production environment is the best use of time without having some specific
situation you’re trying to address.</p>

      ]]>
    </content>
    <author>
      <name>Ben Burwell</name>
    </author>
    <summary type="html">
      I’d heard about being able to run Docker containers with a custom security profile, but wasn’t really sure what that meant or what was happening behind the scenes, so I decided to do some experimentation to find out.
It turns out that the Linux kernel includes a feature called “secure computing mode,” or seccomp for short. Using seccomp lets you tell the kernel that you only expect your program to use a specific set of system calls, and if your program makes any system calls that aren’t in your approved list, the kernel should kill your program.
    </summary>
  </entry>
  
  <entry>
    <title type="html">How to Add Row Level Security to Views in PostgreSQL</title>
    <link href="https://www.benburwell.com/posts/row-level-security-postgresql-views/" rel="alternate" type="text/html" title="How to Add Row Level Security to Views in PostgreSQL" />
    <published>2020-04-02T00:00:00Z</published>
    <updated>2020-04-02T00:00:00Z</updated>
    <id>https://www.benburwell.com/posts/row-level-security-postgresql-views/</id>
    <content type="html" xml:base="/posts/row-level-security-postgresql-views/">
      <![CDATA[
        <p>Recently, I needed to store some customer-specific data in a PostgreSQL database
and grant customers access to only their data in the shared tables. Fortunately,
PostgreSQL has support for row level security in conjunction with its RBAC model
which helps us do exactly that.</p>
<p>While row level security does exactly what we need it to for tables, I ran into
a challenge when I needed to apply the same row level security to views built
from the tables: row level security is <em>only</em> available on tables, not on views!
Luckily, I was able to find a way to accomplish what I needed to and learned
some more about Postgres along the way.</p>
<p><strong>How to follow along in a Docker “lab” with our schema and dummy data:</strong></p>
<pre><code># Run the docker container:
$ docker run --rm --detach --name rlslab benburwell/postgres-rls-lab

# Connect to the database in the container using psql:
$ docker exec -it rlslab psql -U postgres

# Remember to stop the container when you’re done!
$ docker stop rlslab
</code></pre>
<p><strong>Back to the good stuff:</strong></p>
<p>Let’s start off by creating some tables that we’ll store customer-specific data
in. To grant our customers access to only their data in these tables, we’ll be
creating a role for each customer, e.g. <code>customer_a</code>, <code>customer_b</code>, and so on,
and we’ll include a <code>customer_user</code> column on each table that specifies the role
which should have access to that row:</p>
<pre><code>CREATE TABLE milestones (
  id serial primary key,
  customer_user varchar,
  name varchar
);

CREATE TABLE milestone_events (
  milestone_id int,
  customer_user varchar,
  name varchar
);
</code></pre>
<p>Now, we’ll create the customer users. To simplify management, we can create a
generic <code>customer</code> role that has the access we want each customer to have, and
then just grant that role to new customers as we onboard them.</p>
<pre><code>CREATE ROLE customer;
GRANT SELECT ON milestones TO customer;
GRANT SELECT ON milestone_events TO customer;
</code></pre>
<p>Next, we’ll create our individual customer roles and grant them the privileges
from the generic <code>customer</code> role we just created:</p>
<pre><code>CREATE ROLE customer_a;
CREATE ROLE customer_b;
GRANT customer TO customer_a, customer_b;
</code></pre>
<p>Next, let’s populate our <code>milestones</code> and <code>milestone_events</code> tables with some
dummy data:</p>
<pre><code>postgres=# SELECT * FROM milestones;
 id | customer_user |          name
----+---------------+---------------------------
  1 | customer_a    | A great milestone
  2 | customer_a    | Another milestone
  3 | customer_b    | Customer B milestone
  4 | customer_c    | Spooky invisible milestone

postgres=# SELECT * FROM milestone_events;
 milestone_id | customer_user |      name
--------------+---------------+----------------
            1 | customer_a    | First task
            1 | customer_a    | Second task
            2 | customer_a    | Another task
            3 | customer_b    | B event
            4 | customer_c    | Invisible task
</code></pre>
<p>Now, we’ll add the row-level security policies to these tables so that customer
users only have access to the appropriate rows in these tables:</p>
<pre><code>postgres=# ALTER TABLE milestones ENABLE ROW LEVEL SECURITY;
ALTER TABLE
postgres=# CREATE POLICY customer_access ON milestones
postgres-# FOR SELECT
postgres-# USING (customer_user = current_user);
CREATE POLICY
</code></pre>
<p>Let’s switch over to the <code>customer_a</code> role and check out the results:</p>
<pre><code>postgres=# set role customer_a;
postgres=&gt; select * from milestones;
 id | customer_user |       name
----+---------------+-------------------
  1 | customer_a    | A great milestone
  2 | customer_a    | Another milestone
</code></pre>
<p>Nice! Because of our row-level security policy on the <code>milestones</code> table, we
only see the rows where <code>customer_user</code> matches our current user, <code>customer_a</code>.</p>
<p>It would be really nice to create a view for these tables so that we can see all
the events with their related milestone names. Let’s jump back to the <code>postgres</code>
role and create the view:</p>
<pre><code>postgres=# CREATE VIEW milestone_events_view AS
postgres-# SELECT milestone_id, m.name as milestone_name, e.name as event_name
postgres-# FROM milestone_events e
postgres-# JOIN milestones m ON e.milestone_id = m.id;
CREATE VIEW
postgres=# GRANT SELECT ON milestone_events_view TO customer;
GRANT
</code></pre>
<p>Let’s switch back over to our <code>customer_a</code> role and take a look:</p>
<pre><code>postgres=&gt; SELECT * FROM milestone_events_view;
 milestone_id |       milestone_name       |   event_name
--------------+----------------------------+----------------
            1 | A great milestone          | First task
            1 | A great milestone          | Second task
            2 | Another milestone          | Another task
            3 | Customer B milestone       | B event
            4 | Spooky invisible milestone | Invisible task
</code></pre>
<p>Whoa! We shouldn’t be able to see all these other customers’ data! That was the
whole point of the row level security policy we set up! As it turns out,
PostgreSQL views always adhere to the permissions of their <em>owner</em> (in this case
the <code>postgres</code> superuser) rather than the current user.</p>
<p>How can we fix this? Changing the owner of the view wouldn’t help us because
then all the customer users would just see <code>customer_a</code>’s data.</p>
<p>My solution was to create a function that does the selection. In Postgres,
functions can either be run with the privileges of the user who created them (by
specifying <code>SECURITY DEFINER</code>), or as the user calling them (with <code>SECURITY INVOKER</code>).</p>
<pre><code>CREATE FUNCTION customer_milestone_events()
RETURNS TABLE (
  milestone_id int,
  milestone_name varchar,
  event_name varchar
)
LANGUAGE sql
SECURITY INVOKER
AS $$
  SELECT milestone_id, m.name AS milestone_name, e.name AS event_name
  FROM milestone_events e
  JOIN milestones m ON e.milestone_id = m.id
$$;
</code></pre>
<p>In order to make the results conveniently available as a view, we can create a
view based on this function:</p>
<pre><code>CREATE VIEW pub_milestone_events AS SELECT * FROM customer_milestone_events();
GRANT SELECT ON pub_milestone_events TO customer;
</code></pre>
<p>Now, when we switch over to our <code>customer_a</code> role and query our new view, we
only see the rows we’re supposed to see:</p>
<pre><code>postgres=&gt; select * from pub_milestone_events ;
 milestone_id |  milestone_name   |  event_name
--------------+-------------------+--------------
            1 | A great milestone | First task
            1 | A great milestone | Second task
            2 | Another milestone | Another task
</code></pre>
<p>And as <code>customer_b</code>:</p>
<pre><code>postgres=&gt; select * from pub_milestone_events ;
 milestone_id |    milestone_name    | event_name
--------------+----------------------+------------
            3 | Customer B milestone | B event
</code></pre>
<p>Tada! Row level security on views in PostgreSQL.</p>

      ]]>
    </content>
    <author>
      <name>Ben Burwell</name>
    </author>
    <summary type="html">
      Recently, I needed to store some customer-specific data in a PostgreSQL database and grant customers access to only their data in the shared tables. Fortunately, PostgreSQL has support for row level security in conjunction with its RBAC model which helps us do exactly that.
While row level security does exactly what we need it to for tables, I ran into a challenge when I needed to apply the same row level security to views built from the tables: row level security is only available on tables, not on views!
    </summary>
  </entry>
  
  <entry>
    <title type="html">MIG welding</title>
    <link href="https://www.benburwell.com/posts/welding/" rel="alternate" type="text/html" title="MIG welding" />
    <published>2020-02-13T00:00:00Z</published>
    <updated>2020-02-13T00:00:00Z</updated>
    <id>https://www.benburwell.com/posts/welding/</id>
    <content type="html" xml:base="/posts/welding/">
      <![CDATA[
        <p>Recently, I took a MIG welding class at <a href="https://artisansasylum.com">Artisan’s
Asylum</a> in Somerville, MA. I wanted to document what
I learned so that I can refer back to it in the future, so here it is!</p>
<h2 id="safety">Safety</h2>
<p>There are four primary hazards:</p>
<ol>
<li><strong>Burns.</strong> You’re dealing with liquid metal, so the work piece will remain
hot even after you finish a weld. Also, small balls of molten steel will fly
away from the work area and can burn through clothing and footwear, or ignite
flammable objects nearby. Precautions: welding jacket, welding gloves, face
mask, eye protection. Scan the surrounding area for fire hazards before
welding.</li>
<li><strong>Electric shock.</strong> MIG welding is an electrical process in which current is
passed between the welding wire and the ground clamp. Precautions: don’t weld
in damp areas.</li>
<li><strong>Radiation.</strong> Ultraviolet light emitted from the arc can damage eyesight.
Precautions: welding face shield, preferably an auto-darkening one, though a
small amount of exposure will occur due to the delay between the arc flash
and the sensor activation, mostly a concern when you spend 40 hours a week
welding.</li>
<li><strong>Asphyxiation.</strong> MIG welding typically uses a gas mix called C25, a mix of
75% argon and 25% carbon dioxide, in order to displace oxygen from the work
area which would oxidize the molten steel and prevent a solid weld from
forming. As the gas mix displaces oxygen, welding in an enclosed area could
result in hypoxia. Precautions: weld in a well-ventilated area. The welding
shop at A^2 has an exhaust system.</li>
</ol>
<h2 id="mechanics">Mechanics</h2>
<p>There are two primary controls on the welding machine: voltage (a proxy for
temperature) and feed speed. There’s a chart on the welding machine with
recommended settings for types and thicknesses of sheet metal. Some machines
have an “auto feed” setting which tends to work fairly well.</p>
<p>Don’t weld anything galvanized, as the zinc will vaporize and can cross the
blood-brain barrier causing neurological damage.</p>
<p>There are three common gauges of welding wire: 0.023&quot;, 0.030&quot;, and 0.035&quot;. 030
is a good general purpose wire. When you switch out a reel, be sure that the
feed rollers are fitted for the correct gauge. The number facing out on the reel
is the groove in use, regardless of whether it’s actually on the same side of
the number or the opposite side.</p>
<p>To reduce friction on the sleeve that holds the welding wire inside the gas
cable, try to avoid sharp loops, similar to how a garden hose can kink.</p>
<p>You want to keep the nozzle about 3/8&quot; away from the work piece.</p>
<p>After a weld, slag and ash may accumulate on the surface. This can be brushed
off with a wire brush or ground down for a cleaner surface.</p>
<p>The welding tip is where the current is transferred from the electrode running
down the cable into the welding wire. It’s relatively easy for the tip to become
damaged and require replacement; it’s a 15-cent consumable part so not a big
deal. To replace, pull off (don’t unscrew) the sheath from the end of the
nozzle, unscrew the tip, and screw a new one in. The inner diameter of the
sheath may also accumulate slag buildup over time which can easily be cleaned
out using the tapered end of MIG welding pliers.</p>
<p>To warn people nearby, announce “welding” before starting a weld.</p>
<p>Two directions for welding: push and drag. Not much difference between them
other than what you can see: drag allows you to see the bead you’re laying while
push allows you to see where you’re going.</p>
<h2 id="tack-welds">Tack Welds</h2>
<p>Useful for tacking a piece in place before laying a bead. With about 3/8&quot; of
wire protruding from the tip, place the tip against the work surface, and
depress the trigger for about 1 second.</p>
<h2 id="beads">Beads</h2>
<p>A bead can be produced by dragging the end of the welding wire along the work
piece. The bead should be about twice as wide as the work piece is thick; e.g.
for 1/8&quot; steel the bead should be about 1/4&quot; wide. The width can be controlled
by regulating the speed at which you drag the nozzle across the work surface.</p>
<p>A nicer bead can be laid by using a circular motion, which also produces the
aesthetically pleasing waves.</p>
<p>The height of the bead should be fairly low, as the larger the angle between the
sheet metal and the bead, the less sturdy the weld will be.</p>
<h2 id="fillets">Fillets</h2>
<p>This is basically just a bead laid to join two pieces at a right angle. Use the
same basic technique, but you’ll be bumping then nozzle up against the side and
bottom pieces in order to be close enough to the work area right at the join.</p>
<h2 id="series-of-tacks">“Series of tacks”</h2>
<p>Can be useful when welding very thin stock that welding a bead might melt right
through. This works by using the thermal capacity of the previously welded tack
to help absorb some of the heat. Place the wire right up against the previous
tack for optimal heat dissipation.</p>
<h2 id="fill-in">Fill-in</h2>
<p>You can fix a hole by placing a bunch of dots inside it. It won’t look super
pretty, you’ll almost certainly need to grind it down afterwards, and it won’t
be very strong. You could use it as an aesthetic (rather than structural) fix
e.g. if it’ll be painted over afterwards.</p>

      ]]>
    </content>
    <author>
      <name>Ben Burwell</name>
    </author>
    <summary type="html">
      Recently, I took a MIG welding class at Artisan’s Asylum in Somerville, MA. I wanted to document what I learned so that I can refer back to it in the future, so here it is!
Safety There are four primary hazards:
Burns. You’re dealing with liquid metal, so the work piece will remain hot even after you finish a weld. Also, small balls of molten steel will fly away from the work area and can burn through clothing and footwear, or ignite flammable objects nearby.
    </summary>
  </entry>
  
  <entry>
    <title type="html">How the Dewey Decimal Classification Works</title>
    <link href="https://www.benburwell.com/posts/how-dewey-decimal-works/" rel="alternate" type="text/html" title="How the Dewey Decimal Classification Works" />
    <published>2020-02-11T00:00:00Z</published>
    <updated>2020-02-11T00:00:00Z</updated>
    <id>https://www.benburwell.com/posts/how-dewey-decimal-works/</id>
    <content type="html" xml:base="/posts/how-dewey-decimal-works/">
      <![CDATA[
        <p>The Dewey Decimal Classification (DDC) is widely used in libraries to organize
their collections. I think a lot of people have probably used the DDC to find a
book in a library, and a lot of people generally know how it works: number
ranges correspond to high-level topics, with more numbers in the middle to fill
in more specific subjects. You might be familiar with the table of main classes:</p>
<table>
  <tr>
    <td>000</td>
    <td>Computers and general information</td>
  </tr>
  <tr>
    <td>100</td>
    <td>Philosophy and Psychology</td>
  </tr>
  <tr>
    <td>200</td>
    <td>Religion</td>
  </tr>
  <tr>
    <td>300</td>
    <td>Social Sciences</td>
  </tr>
  <tr>
    <td>400</td>
    <td>Language</td>
  </tr>
  <tr>
    <td>500</td>
    <td>Math and Science</td>
  </tr>
  <tr>
    <td>600</td>
    <td>Technology</td>
  </tr>
  <tr>
    <td>700</td>
    <td>Art</td>
  </tr>
  <tr>
    <td>800</td>
    <td>Literature</td>
  </tr>
  <tr>
    <td>900</td>
    <td>History and Geography</td>
  </tr>
</table>
<p>I’ve always been interested in how the rest of the digits were decided on, so I
decided to learn more! Surprisingly, it’s a bit challenging to find references
on the DDC because it’s actually sort of a proprietary system. It’s managed and
published by the Online Computer Library Center (OCLC), and they’re quite happy
to sell you the DDC or access to WebDewey for many hundreds of dollars.</p>
<p>After some further digging, I came across an <a href="http://nlc.nebraska.gov/handouts/classmaterials/ddcsummer2014/dewey.html">online class on the Dewey Decimal
Classification</a> from the Nebraska Library Commission. It’s three sessions
of about an hour each. And now I know a lot more about how the DDC works!</p>
<p>The DDC was created in the 1870s by Melvil Dewey, who <a href="https://en.wikipedia.org/wiki/Melvil_Dewey#Controversies">was a problematic
person</a>, and as a result the DDC has <a href="https://en.wikipedia.org/wiki/Dewey_Decimal_Classification#Influence_and_criticism">its share of issues</a>. For
these reasons and others, many libraries are moving away from the DDC to other
systems such as the <a href="https://en.wikipedia.org/wiki/Library_of_Congress_Classification">Library of Congress classification system</a> or the
<a href="https://bisg.org/page/BISACSubjectCodes">BISAC subject codes</a> used by many booksellers. Though it might be in
decline, it’s widely-enough used that I still wanted to learn more about it.</p>
<p>The DDC organizes works into one of the ten main classes shown above. Each class
has ten divisions (the second digit), and each division has ten sections (the
third digit). There are further subdivisions that can be applied for more
specific works. Overall, this forms a tree structure in which each subsequent
digit traverses down the tree to a more specific topic. Works are classified
into the node which is as specific as possible, so in general a shorter number
or a number with fewer non-zero terminal digits will refer to a work that covers
a broader range of topics.</p>
<p>In order to properly class works, there are two primary variables to consider:
the subject/topic and the discipline. For example, you might class a work on
dogs either in 599.77 (Natural sciences and mathematics &gt; Animals &gt; Mammals &gt;
Carnivores &gt; Dog Family), or in 636.7 (Technology &gt; Argiculture and related
technologies &gt; Animal husbandry &gt; Dogs), depending on whether it was a book
about the physiology of dogs or on keeping dogs as pets.</p>
<p>Of course, sometimes a work covers multiple topics or even multiple disciplines.
The DDC has rules which dictate how these situations should be handled. (To
continue the dog example, if you look in 599.77, there is a note which says
“class interdisciplinary works on dogs in 636.7,” so if a work covered both the
biology and raising of dogs, it should be classed in 636.7).</p>
<p>If you were to buy a hard copy of the DDC, you’d notice that there are a few
different parts. The main part that people think of as the DDC is called the
“schedules.” This is the big list of all the top-level numbers, arranged into
chapters for each main class. There’s also an introduction, which has rules for
deciding where works should be classed. For example the rule of fuller treatment
says that if a work covers two or more topics, but covers one topic more fully
than all the others, the work should be classed under that topic. There’s also
the rule of two, which states that if a work covers two topics fairly equally,
it should be classed under the lower number. For example, a work on with equal
treatment of French bulldogs (636.72) and Welsh corgis (636.737) should be
classed under the lower number, 636.72.</p>
<p>In addition to the introduction and the schedules, there’s also the manual which
helps you resolve some specific situations (usually you’ll see a note in the
schedules like “See manual 636.72-636.75” that points you to go there), the
relative index, and the tables. The relative index is generally the starting
point for classifying a work. You can look up a topic alphabetically, and you’ll
be pointed to all the different possible classifications. And finally, the
tables, which help classify works more specifically.</p>
<p>This introduces a topic called “number building.” The DDC doesn’t actually
contain a specific entry for each possible topic, but relies on adding standard
subdivisions to numbers listed in the schedules. Table 1 contains the standard
subdivisions, which you can add as a suffix to pretty much any number you find
in the schedules. The standard subdivisions include:</p>
<table>
  <tr>
    <td>&mdash;01</td>
    <td>Philosophy and theory</td>
  </tr>
  <tr>
    <td>&mdash;02</td>
    <td>Miscellany</td>
  </tr>
  <tr>
    <td>&mdash;03</td>
    <td>Dictionaries, encyclopedias, concordances</td>
  </tr>
  <tr>
    <td>&mdash;04</td>
    <td>Special topics</td>
  </tr>
  <tr>
    <td>&mdash;05</td>
    <td>Serial publications</td>
  </tr>
  <tr>
    <td>&mdash;06</td>
    <td>Organizations and management</td>
  </tr>
  <tr>
    <td>&mdash;07</td>
    <td>Education, research, and related topics</td>
  </tr>
  <tr>
    <td>&mdash;08</td>
    <td>Groups of people</td>
  </tr>
  <tr>
    <td>&mdash;09</td>
    <td>History, geographic treatment, biography</td>
  </tr>
</table>
<p>For example, an encyclopedia of programming languages could be classed under
<strong>005</strong>
(Computer programming, programs, data), <strong>.1</strong> (programming),
<strong>3</strong> (programming languages), <strong>—03</strong>
(Dictionaries, encyclopedias, concordances) to yield the number 005.1303. The
book <em>Cracking the Coding Interview</em>, which is about the job of
programming , could be classed under 005.1023, again using 005.1 (programming)
and adding —023, the standard subdivision for “the subject as a
profession, occupation, hobby.”</p>
<p>There are four tables in total; table 2 is used in conjunction with the 09
standard subdivision from table 1, e.g. a book about the architecture of Boston
might be classed under 720.9744, with 720 being the architecture, 09 being the
standard subdivision for geographic treatment, and 744 being the suffix from
table 2 for Massachusetts. You might expect the number to be 720.09744, and it
would be, except that in the schedules under 720, we are instructed to put the
standard subdivisions in .1 through .9.</p>
<p>Table 3 contains subdivisions for literatures and literary forms and is only
used with the main class 800 Literature. For example, a collection of American
plays might be classed as 81 (American literature in English) + 3 (the
subdivision from table 3 for Drama) to get 813 as the result.</p>
<p>Finally, table 4 contains subdivisions for languages, and is only used with the
400 Language main class. It’s used to break down specific attributes of
language, such as —3 for dictionaries. So Webster’s dictionary would be
classed as <strong>420</strong> (English and Old English) + <strong>3</strong>
(dictionaries) = <strong>423</strong>.</p>
<p>There’s a lot to the system, and while there is still a lot I don’t know, I now
know a lot more about how it works than I did previously! If I got something
wrong here, please email me about it! I’d love to learn more.</p>

      ]]>
    </content>
    <author>
      <name>Ben Burwell</name>
    </author>
    <summary type="html">
      The Dewey Decimal Classification (DDC) is widely used in libraries to organize their collections. I think a lot of people have probably used the DDC to find a book in a library, and a lot of people generally know how it works: number ranges correspond to high-level topics, with more numbers in the middle to fill in more specific subjects. You might be familiar with the table of main classes:
    </summary>
  </entry>
  
  <entry>
    <title type="html">Solving the SQL Murder Mystery</title>
    <link href="https://www.benburwell.com/posts/sql-murder-mystery/" rel="alternate" type="text/html" title="Solving the SQL Murder Mystery" />
    <published>2019-12-20T00:00:00Z</published>
    <updated>2019-12-20T00:00:00Z</updated>
    <id>https://www.benburwell.com/posts/sql-murder-mystery/</id>
    <content type="html" xml:base="/posts/sql-murder-mystery/">
      <![CDATA[
        <p>I saw this <a href="https://github.com/NUKnightLab/sql-mysteries">SQL Murder Mystery</a>
appear on Hacker News recently, thought it sounded fun, and figured I’d do a
quick write-up of how I worked through it.</p>
<p>If you want to follow along, go ahead and <a href="https://static.benburwell.com/blog/sql-murder-mystery.db">download the SQLite database</a>
(which is copyright NUKnightLab and redistributed here under the <a href="https://opensource.org/licenses/MIT">MIT
license</a>). You’ll need some kind of SQLite client to interact with it (I
just used the <code>sqlite3</code> CLI tool).</p>
<p>In addition to the database, it’s very helpful to start with a prompt:</p>
<blockquote>
<p>A crime has taken place and the detective needs your help. The detective gave
you the crime scene report, but you somehow lost it. You vaguely remember that
the crime was a murder that occurred sometime on Jan. 15, 2018 and that it
took place in SQL City. Start by retrieving the corresponding crime scene
report from the police department’s database. If you want to get the most out
of this mystery, try to work through it only using your SQL environment and
refrain from using a notepad.</p>
</blockquote>
<p>Let’s start by seeing what tables are available. The <code>sqlite3</code> CLI uses
meta-commands that start with a dot, like this:</p>
<pre><code>sqlite&gt; .tables
crime_scene_report      get_fit_now_check_in    interview
drivers_license         get_fit_now_member      person
facebook_event_checkin  income                  solution
</code></pre>
<p>Okay, let’s start with finding our crime scene report. First, we’ll need to know
what the data looks like. We can learn about this with the <code>.schema</code> command:</p>
<pre><code>sqlite&gt; .schema crime_scene_report
CREATE TABLE crime_scene_report (
        date integer,
        type text,
        description text,
        city text
    );
</code></pre>
<p>Okay, seems pretty straightforward. The only thing I’m not quite sure about is
how the date is being represented -- it’s just stored as an integer. A UNIX
timestamp perhaps? Let’s sample the data:</p>
<pre><code>sqlite&gt; select date from crime_scene_report limit 5;
</code></pre>
<table>
  <tr><th>date</th></tr>
  <tr><td>20180115</td></tr>
  <tr><td>20180115</td></tr>
  <tr><td>20180115</td></tr>
  <tr><td>20180215</td></tr>
  <tr><td>20180215</td></tr>
</table>
<p>Okay, seems it’s just being stored as YYYYMMDD. Let’s take a crack at finding
the crime scene report! We know the type (murder) and the city (SQL City). Let’s
be generous with the date and assume it was sometime in January of 2018:</p>
<pre><code>sqlite&gt; select * from crime_scene_report
   ...&gt; where type = 'murder'
   ...&gt; and city = 'SQL City'
   ...&gt; and date between 20180101 and 20180131;
</code></pre>
<table>
  <tr>
    <th>date</th>
    <th>type</th>
    <th>description</th>
    <th>city</th>
  </tr>
  <tr>
    <td>20180115</td>
    <td>murder</td>
    <td>
      Security footage shows that there were 2 witnesses. The first witness
      lives at the last house on &quot;Northwestern Dr&quot;. The second
      witness, named Annabel, lives somewhere on &quot;Franklin Ave&quot;.
    </td>
    <td>SQL City</td>
  </tr>
</table>
<p>Great, there’s only one row that matches our broad date criteria! Let’s see if
we can track down these witnesses. First, let’s see how the data we need is
structured:</p>
<pre><code>sqlite&gt; .schema person
CREATE TABLE person (
        id integer PRIMARY KEY,
        name text,
        license_id integer,
        address_number integer,
        address_street_name text,
        ssn integer,
        FOREIGN KEY (license_id) REFERENCES drivers_license(id)
    );
sqlite&gt; .schema interview
CREATE TABLE interview (
        person_id integer,
        transcript text,
        FOREIGN KEY (person_id) REFERENCES person(id)
    );
</code></pre>
<p>Okay, so we need to find the two rows in the person table, and then use their
ids to cross reference their interview text. This is “the big idea” with
relational databases, joining data in several tables based on something they
have in common.</p>
<p>We’ll start with the witness who lives on Northwestern Drive.  We know that they
live in “the last house,” which presumably has the highest house number on that
street. We can easily find this by first filtering for only people who live on
Northwestern Drive, then ordering those results by house number in descending
order, and only showing the first result:</p>
<pre><code>sqlite&gt; select * from person
   ...&gt; where address_street_name = 'Northwestern Dr'
   ...&gt; order by address_number desc
   ...&gt; limit 1;
</code></pre>
<table>
  <tr>
    <th>id</th>
    <th>name</th>
    <th>license_id</th>
    <th>address_number</th>
    <th>address_street_name</th>
    <th>ssn</th>
  </tr>
  <tr>
    <td>14887</td>
    <td>Morty Schapiro</td>
    <td>118009</td>
    <td>4919</td>
    <td>Northwestern Dr</td>
    <td>111564949</td>
  </tr>
</table>
<p>Great! Now let’s find Annabel. We can use SQL’s <code>LIKE</code> operator to match a
partial name, along with the name of their street:</p>
<pre><code>sqlite&gt; select * from person
   ...&gt; where name like 'Annabel%'
   ...&gt; and address_street_name = 'Franklin Ave';
</code></pre>
<table>
  <tr>
    <th>id</th>
    <th>name</th>
    <th>license_id</th>
    <th>address_number</th>
    <th>address_street_name</th>
    <th>ssn</th>
  </tr>
  <tr>
    <td>16371</td>
    <td>Annabel Miller</td>
    <td>490173</td>
    <td>103</td>
    <td>Franklin Ave</td>
    <td>318771143</td>
  </tr>
</table>
<p>Okay, so we’ve got our person IDs: <code>14887</code> and <code>16371</code>. I think we’re going to
want these IDs in a bunch of upcoming queries, so let’s help our future selves
out by saving their IDs as parameters (a sort of temporary variable):</p>
<pre><code>sqlite&gt; .parameter set $MORTY 14887
sqlite&gt; .parameter set $ANNABEL 16371
</code></pre>
<p>Let’s grab their interviews. To do this, we’ll put joins to use for the first
time so we can show their name rather than just their person ID. We’re selecting
records from the <code>interview</code> table, but <em>joining</em> matching records from the
<code>person</code> table, using the <code>person_id</code> column to match up the people.</p>
<pre><code>sqlite&gt; select person.name, interview.transcript
   ...&gt; from interview
   ...&gt; join person on person.id = interview.person_id
   ...&gt; where person_id in ($MORTY, $ANNABEL);
</code></pre>
<table>
  <tr>
    <th>name</th>
    <th>transcript</th>
  </tr>
  <tr>
    <td>Morty Schapiro</td>
    <td>
      I heard a gunshot and then saw a man run out. He had a &quot;Get Fit Now
      Gym&quot; bag. The membership number on the bag started with
      &quot;48Z&quot;.  Only gold members have those bags. The man got into a
      car with a plate that included &quot;H42W&quot;.
    </td>
  </tr>
  <tr>
    <td>Annabel Miller</td>
    <td>
      I saw the murder happen, and I recognized the killer from my gym when I
      was working out last week on January the 9th.
    </td>
  </tr>
</table>
<p>Okay, we’ve got tons of info now! Since the car and bag might not belong to the
killer, I think our best lead for narrowing things down is to see all the people
who crossed paths with Annabel at the gym on January 9th, 2018. Let’s see what
those tables look like:</p>
<pre><code>sqlite&gt; .schema get_fit_now_check_in
CREATE TABLE get_fit_now_check_in (
        membership_id text,
        check_in_date integer,
        check_in_time integer,
        check_out_time integer,
        FOREIGN KEY (membership_id) REFERENCES get_fit_now_member(id)
    );
sqlite&gt; .schema get_fit_now_member
CREATE TABLE get_fit_now_member (
        id text PRIMARY KEY,
        person_id integer,
        name text,
        membership_start_date integer,
        membership_status text,
        FOREIGN KEY (person_id) REFERENCES person(id)
    );
</code></pre>
<p>Alright, time to look for some check-ins! We could do this in two separate
queries, one to find Annabel’s Get Fit Now member ID by using her <code>person_id</code>,
and a second query to find her check-ins using her <code>membership_id</code>, but we can
also use a sub-query to do this in one shot:</p>
<pre><code>sqlite&gt; select check_in_time, check_out_time
   ...&gt; from get_fit_now_check_in
   ...&gt; where date = 20180109
   ...&gt; and membership_id = (
   ...&gt;   select id
   ...&gt;   from get_fit_now_member
   ...&gt;   where person_id = $ANNABEL);
</code></pre>
<table>
  <tr>
    <th>check_in_time</th>
    <th>check_out_time</th>
  </tr>
  <tr>
    <td>1600</td>
    <td>1700</td>
  </tr>
</table>
<p>Looks like Annabel was at the gym from 4pm to 5pm on the 9th. Since we’re
looking for someone who overlapped with Annabel at the gym, we’re looking for
someone who arrived before 5pm and left after 4pm. Again, we’ll join some tables
together here so we can grab their names and person IDs right away, not just
their membership numbers:</p>
<pre><code>sqlite&gt; select person.id, person.name, get_fit_now_member.id,
   ...&gt;   get_fit_now_check_in.check_in_time,
   ...&gt;   get_fit_now_check_in.check_out_time
   ...&gt; from get_fit_now_check_in
   ...&gt; join get_fit_now_member on get_fit_now_member.id = membership_id
   ...&gt; join person on person.id = person_id
   ...&gt; where check_in_date = 20180109
   ...&gt; and check_in_time &lt;= 1700 and check_out_time &gt;= 1600;
</code></pre>
<table>
  <tr>
    <th>id</th>
    <th>name</th>
    <th>id</th>
    <th>check_in_time</th>
    <th>check_out_time</th>
  </tr>
  <tr>
    <td>28819</td>
    <td>Joe Germuska</td>
    <td>48Z7A</td>
    <td>1600</td>
    <td>1730</td>
  </tr>
  <tr>
    <td>67318</td>
    <td>Jeremy Bowers</td>
    <td>48Z55</td>
    <td>1530</td>
    <td>1700</td>
  </tr>
  <tr>
    <td>16371</td>
    <td>Annabel Miller</td>
    <td>90081</td>
    <td>1600</td>
    <td>1700</td>
  </tr>
</table>
<p>Interesting, there were only two other gym members who were checked in for a
period overlapping with Annabel on the 9th. Let’s save their IDs as well:</p>
<pre><code>sqlite&gt; .parameter set $JOE 28819
sqlite&gt; .parameter set $JEREMY 67318
</code></pre>
<p>Their member numbers both start with 48Z; let’s take a look at their vehicles,
presumably in the <code>drivers_license</code> table:</p>
<pre><code>sqlite&gt; .schema drivers_license
CREATE TABLE drivers_license (
        id integer PRIMARY KEY,
        age integer,
        height integer,
        eye_color text,
        hair_color text,
        gender text,
        plate_number text,
        car_make text,
        car_model text
    );

sqlite&gt; select person.id, person.name, drivers_license.*
   ...&gt; from person
   ...&gt; join drivers_license on drivers_license.id = person.license_id
   ...&gt; where person.id in ($JOE, $JEREMY);
</code></pre>
<table>
  <tr>
    <th>id</th>
    <th>name</th>
    <th>id</th>
    <th>age</th>
    <th>height</th>
    <th>eye_color</th>
    <th>hair_color</th>
    <th>gender</th>
    <th>plate_number</th>
    <th>car_make</th>
    <th>car_model</th>
  </tr>
  <tr>
    <td>67318</td>
    <td>Jeremy Bowers</td>
    <td>423327</td>
    <td>30</td>
    <td>70</td>
    <td>brown</td>
    <td>brown</td>
    <td>male</td>
    <td>0H42W2</td>
    <td>Chevrolet</td>
    <td>Spark LS</td>
  </tr>
</table>
<p>So only Jeremy Bowers has a drivers license. And his car’s license plate does
contain H42W, so it looks like we’ve found the killer! According to the
instructions in the GitHub repository, we should insert our answer into the
<code>solution</code> table, then query it:</p>
<pre><code>sqlite&gt; insert into solution values (1, 'Jeremy Bowers');
sqlite&gt; select value from solution;
</code></pre>
<table>
  <tr>
    <th>value</th>
  </tr>
  <tr>
    <td>
      Congrats, you found the murderer! But wait, there&#39;s more... If you
      think you&#39;re up for a challenge, try querying the interview transcript
      of the murderer to find the real villian behind this crime. If you feel
      especially confident in your SQL skills, try to complete this final step
      with no more than 2 queries.
    </td>
  </tr>
</table>
<p>Aha! We did correctly identify Jeremy Bowers. Let’s see if we can connect the
dots to find the mastermind! First, we’ll grab the killer’s (Jeremy’s) interview
transcript:</p>
<pre><code>sqlite&gt; select transcript from interview where person_id = $JEREMY;
</code></pre>
<table>
  <tr>
    <th>transcript</th>
  </tr>
  <tr>
    <td>
      I was hired by a woman with a lot of money. I don&#39;t know her name but
      I know she&#39;s around 5&#39;5&quot; (65&quot;) or 5&#39;7&quot;
      (67&quot;). She has red hair and she drives a Tesla Model S. I know that
      she attended the SQL Symphony Concert 3 times in December 2017.
    </td>
  </tr>
</table>
<p>Alright, there goes one query... one more to make it count! We’re going to be
correlating data from a bunch of tables here: person, income (related by SSN,
probably as a sort criterion since we don’t have an exact figure to work with),
we can grab height, hair color, gender, and car make/model from the drivers
licenses. It’s a bit of a risk to filter by Facebook checkins to the SQL
Symphony, since we don’t know that she checked in at all, but maybe we can
include the count of the number of times there was a check-in at the symphony
during December. Let’s get a reminder of what these tables look like:</p>
<pre><code>sqlite&gt; .schema person
CREATE TABLE person (
        id integer PRIMARY KEY,
        name text,
        license_id integer,
        address_number integer,
        address_street_name text,
        ssn integer,
        FOREIGN KEY (license_id) REFERENCES drivers_license(id)
    );
sqlite&gt; .schema income
CREATE TABLE income (
        ssn integer PRIMARY KEY,
        annual_income integer
    );
sqlite&gt; .schema facebook_event_checkin
CREATE TABLE facebook_event_checkin (
        person_id integer,
        event_id integer,
        event_name text,
        date integer,
        FOREIGN KEY (person_id) REFERENCES person(id)
    );
sqlite&gt; .schema drivers_license
CREATE TABLE drivers_license (
        id integer PRIMARY KEY,
        age integer,
        height integer,
        eye_color text,
        hair_color text,
        gender text,
        plate_number text,
        car_make text,
        car_model text
    );
</code></pre>
<p>And assemble our final mega-query!</p>
<pre><code>sqlite&gt; select p.id, p.name, i.annual_income, dl.height, dl.hair_color,
   ...&gt; dl.gender, dl.car_make, dl.car_model, (
   ...&gt;   select count(*)
   ...&gt;   from facebook_event_checkin
   ...&gt;   where person_id = p.id
   ...&gt;   and event_name like '%symphony%'
   ...&gt;   and date between 20171201 and 20171231) as num_symphonies
   ...&gt; from person p
   ...&gt; join income i on i.ssn = p.ssn
   ...&gt; join drivers_license dl on dl.id = p.license_id
   ...&gt; where dl.height between 64 and 68
   ...&gt; and dl.hair_color like '%red%'
   ...&gt; and car_make like '%tesla%'
   ...&gt; and car_model like '%s%'
   ...&gt; order by i.annual_income desc;
</code></pre>
<table>
  <tr>
    <th>id</th>
    <th>name</th>
    <th>annual_income</th>
    <th>height</th>
    <th>hair_color</th>
    <th>gender</th>
    <th>car_make</th>
    <th>car_model</th>
    <th>num_symphonies</th>
  </tr>
  <tr>
    <td>99716</td>
    <td>Miranda Priestly</td>
    <td>310000</td>
    <td>66</td>
    <td>red</td>
    <td>female</td>
    <td>Tesla</td>
    <td>Model S</td>
    <td>3</td>
  </tr>
  <tr>
    <td>78881</td>
    <td>Red Korb</td>
    <td>278000</td>
    <td>65</td>
    <td>red</td>
    <td>female</td>
    <td>Tesla</td>
    <td>Model S</td>
    <td>0</td>
  </tr>
</table>
<p>Okay, so we actually got two results for red-haired people around 66&quot; tall who
make a lot of money and drive Tesla Model S’s. However, one of them attended the
symphony three times in December (and makes even more money), so I think we’ve
found the mastermind!</p>
<p>I didn’t include gender in the filter as I wasn’t sure how the data looked, and
I technically would’ve needed an additional query to discover that.</p>
<pre><code>sqlite&gt; insert into solution values (1, 'Miranda Priestly');
sqlite&gt; select value from solution;
</code></pre>
<table>
  <tr>
    <th>value</th>
  </tr>
  <tr>
    <td>
      Congrats, you found the brains behind the murder! Everyone in SQL City
      hails you as the greatest SQL detective of all time. Time to break out the
      champagne!
    </td>
  </tr>
</table>
<p>Hooray! I had a lot of fun playing through this, and would love to do another
similar puzzle again sometime.</p>

      ]]>
    </content>
    <author>
      <name>Ben Burwell</name>
    </author>
    <summary type="html">
      I saw this SQL Murder Mystery appear on Hacker News recently, thought it sounded fun, and figured I’d do a quick write-up of how I worked through it.
If you want to follow along, go ahead and download the SQLite database (which is copyright NUKnightLab and redistributed here under the MIT license). You’ll need some kind of SQLite client to interact with it (I just used the sqlite3 CLI tool).
    </summary>
  </entry>
  
  <entry>
    <title type="html">(Almost) Pure CSS Material-like Text Fields</title>
    <link href="https://www.benburwell.com/posts/almost-pure-css-material-text-fields/" rel="alternate" type="text/html" title="(Almost) Pure CSS Material-like Text Fields" />
    <published>2019-09-19T00:00:00Z</published>
    <updated>2019-09-19T00:00:00Z</updated>
    <id>https://www.benburwell.com/posts/almost-pure-css-material-text-fields/</id>
    <content type="html" xml:base="/posts/almost-pure-css-material-text-fields/">
      <![CDATA[
        <p>Despite what you may believe from simply looking at this site, I’ve actually
done quite a bit of front-end development. A couple of years ago, I worked on a
project with a friend of mine. For part of the project, he’d designed the
behavior of a form control inspired by Material Design which I then built from
scratch. Recently, he asked me to remind him how I’d implemented it, and I
thought I’d take the opportunity to turn it into a blog post.</p>
<p>Here’s what it looks like:</p>
<style type="text/css">
/* Input container */
.form-group {
  position: relative;
  height: 52px;
  margin: 8px 0;
}

/* Bottom border widget, at rest */
.form-group::after {
  content: '';
  height: 2px;
  position: absolute;
  bottom: 0;
  left: 50%;
  background-color: #900;
  width: 0;
  transition-property: all;
  transition-duration: 0.15s;
  transition-timing-function: ease-out;
}

/* Bottom widget when focused */
.form-group.focused::after {
  width: 100%;
  left: 0;
}

/* The input itself */
.form-control {
  outline: none;
  border: none;
  display: block;
  background-color: transparent;
  position: absolute;
  top: 20px;
  height: 24px;
  font-size: 16px;
  width: 100%;
}

/* Label/Placeholder at rest */
.control-label {
  display: block;
  font-weight: 300;
  position: absolute;
  transition-duration: 0.15s;
  transition-property: all;
  transition-timing-function: ease-out;
  top: 26px;
  color: #666;
}

/* Placeholder disappears when the field is populated and blurred */
.form-group.populated:not(.focused) .control-label {
  top: 0;
  font-size: 12px;
  font-weight: 600;
  color: transparent;
}

/* Placeholder moves up above the input when the user is editing */
.form-group.focused .control-label {
  top: 0;
  font-size: 12px;
  font-weight: 600;
  color: #900;
}

.demo {
  background-color: white;
  color: black;
  position: relative;
  border: 2px solid black;
  padding: 2em;
  font-family: sans-serif;
}
</style>
<div class="demo">
  <div class="form-group" id="demo-group">
    <label class="control-label" for="demo-control">First Name</label>
    <input type="text" class="form-control" id="demo-control">
  </div>
</div>
<script type="text/javascript">
(function() {
  function setPopulated() {
    const group = document.getElementById('demo-group');
    if (this.value) {
      group.classList.add('populated');
    } else {
      group.classList.remove('populated');
    }
  }

  function setFocused(focused) {
    return () => {
      const group = document.getElementById('demo-group');
      if (focused) {
        group.classList.add('focused');
      } else {
        group.classList.remove('focused');
      }
    }
  }

  const input = document.getElementById('demo-control');
  input.addEventListener('input', setPopulated);
  input.addEventListener('paste', setPopulated);
  input.addEventListener('focus', setFocused(true));
  input.addEventListener('blur', setFocused(false));
})()
</script>
<p>It’s not <em>quite</em> pure CSS, but it’s pretty close. Let’s think about how this is
put together.</p>
<p>At a high level, the appearance of the text field at any given moment is the
result of two CSS classes, <code>focused</code> and <code>populated</code>, being added and removed
via JavaScript. On this page, I’ve simply written a few lines of code to add and
remove them at the proper times, but in practice this is probably best done
through your frontend JavaScript framework (Angular/React/Vue/...), if you’re
using one.</p>
<p>First, let’s talk about the moving placeholder. While CSS does have a
<code>::placeholder</code> pseudo-element that we can use for styling how the <code>placeholder</code>
attribute of the <code>&lt;input&gt;</code> is displayed, unfortunately we can’t use it here
because we want the placeholder to remain visible while the user edits the
field, and the browser-supplied placeholder vanishes when the field isn’t empty.</p>
<p>Another semantically-useful way to display this is the <code>&lt;label&gt;</code> element, so
that’s what I’ve used. The label is absolutely positioned to appear over the
<code>&lt;input&gt;</code> where you’d expect the placeholder. So our basic markup looks like
this:</p>
<pre><code>&lt;div class=&quot;form-group&quot;&gt;
  &lt;label class=&quot;control-label&quot;&gt;
    First Name
  &lt;/label&gt;
  &lt;input type=&quot;text&quot; class=&quot;form-control&quot;&gt;
&lt;/div&gt;
</code></pre>
<p>When the <code>populated</code> class is applied to the <code>form-group</code> div, an extra CSS rule
gets applied to the <code>control-label</code>, changing its position, size, and color. CSS
transitions are used to gently animate the movement.</p>
<p>The next interesting element is the heavy bottom border. It would be nice if we
could simply use <code>border-bottom</code> on the <code>&lt;input&gt;</code>, but we want to animate it
collapsing and expanding, and that wouldn’t be possible using <code>border-bottom</code>
without also collapsing and expanding the content of the text input, which we
definitely don’t want.</p>
<p>The solution I came up with was to use the <code>::after</code> pseudo-element to just
display a block of color. At rest, it has <code>width: 0</code>, but when the <code>focused</code>
class is applied to the containing <code>form-group</code>, then it gets <code>width: 100%</code> and
is again animated using CSS transitions.</p>
<p>This is annoyingly close to pure CSS. There are some hacks that can get even
closer to being pure CSS, like using the CSS sibling combinator <code>~</code> to write
rules like</p>
<pre><code>.form-control:focus ~ .control-label {
  /* the control is focused, move the label to the top */
}
</code></pre>
<p>but the ultimate stumbling block is that there’s no way to use the current value
of the text input in a CSS rule, so we can’t make the label disappear when the
input is blurred and non-empty. You can of course use an attribute selector in
your CSS like <code>input:not([value=''])</code>, but this only considers the actual
original attribute value, not whatever it might get changed to by the user later
on. You could of course write some JavaScript to make that happen, but if you’ve
resorted to JavaScript then you may as well just use the easier and cleaner
approach that toggles the classes.</p>
<p>There is <em>one</em> way I thought of that could work to do a pure CSS implementation.
There’s a <code>:valid</code> pseudo-class that considers the HTML form validation
state. If we make the <code>&lt;input&gt;</code> only valid when it is non-empty, either with the
<code>pattern</code> or <code>required</code> attributes, then we could write a rule like</p>
<pre><code>.form-control:not(:focus):valid ~ .control-label {
  /* the control is blurred and has a value, hide the label */
}
</code></pre>
<p>However, <code>:valid</code> isn’t supported in all browsers, and this presumes you aren’t
using the HTML form validation for anything else, so it’s a little too hacky to
rely on. In our case, we were already using React, so adding and removing the
classes with JavaScript ended up being quite easy.</p>
<p>Check out the source code for this page to get the code, I promise it’s easy to
understand!</p>

      ]]>
    </content>
    <author>
      <name>Ben Burwell</name>
    </author>
    <summary type="html">
      Despite what you may believe from simply looking at this site, I’ve actually done quite a bit of front-end development. A couple of years ago, I worked on a project with a friend of mine. For part of the project, he’d designed the behavior of a form control inspired by Material Design which I then built from scratch. Recently, he asked me to remind him how I’d implemented it, and I thought I’d take the opportunity to turn it into a blog post.
    </summary>
  </entry>
  
  <entry>
    <title type="html">Buzzword-Driven “Pop Infosec”</title>
    <link href="https://www.benburwell.com/posts/buzzword-driven-pop-infosec/" rel="alternate" type="text/html" title="Buzzword-Driven “Pop Infosec”" />
    <published>2019-08-06T00:00:00Z</published>
    <updated>2019-08-06T00:00:00Z</updated>
    <id>https://www.benburwell.com/posts/buzzword-driven-pop-infosec/</id>
    <content type="html" xml:base="/posts/buzzword-driven-pop-infosec/">
      <![CDATA[
        <p>Information security is complicated. When you combine that with the fact that an
increasing number of people seem to also consider it to be very important, the
result is something I like to call “pop infosec.”</p>
<p>As in pop science or popular psychology, making information security accessible
often involves simplifying concepts to improve their general palatability which
results in laypeople overestimating their confidence. This <a href="https://en.wikipedia.org/wiki/Easiness_effect">“easiness
effect”</a> has been studied in the
context of science communication, and likely applies to information security in
a parallel sense.</p>
<p>While helping people protect themselves from security threats is certainly
laudable, it’s important to do it responsibly in order to maximize benefit and
minimize harm. Unfortunately, a few recent events I’ve noticed personally
suggest that this is not happening.</p>
<h2 id="the-cloud">“The Cloud”</h2>
<p>I recently read (part of) an article in the Wall Street Journal (before I got
cut off by their paywall) about a data breach which read:</p>
<blockquote>
<p>The data was stored on Amazon.com Inc.’s cloud, according to a federal
criminal complaint and people familiar with the matter. The avenue of entry,
the companies and investigators said, was a poorly configured firewall [...]</p>
<p>Both companies say controls around the data, rather than use of the cloud,
were the problem. Still, the data was stored in the cloud, raising questions
about whether Capital One put insufficient safeguards in place to lock down
customer records when it adopted cloud technology.</p>
</blockquote>
<p>Clearly, the reporter has decided to inject some good old “ZOMG all ur dataz are
in teh cloud” fear mongering. That aside, this is some of the worst analysis
I’ve seen. Imagine you’re trying to keep a box of papers safe; the problem isn’t
you kept the box in a self storage unit instead of in your house, the problem is
that you left the door unlocked. If the company had a poorly configured cloud
environment, why should I expect them to properly configure a firewall in some
other environment?</p>
<p>In other words, the WSJ has this <em>almost</em> right: it does raise questions about
whether sufficient safeguards were in place, but these questions are orthogonal
to any particular technologies or events.</p>
<p>This is simply confusion of correlation and causation. To cite a common example,
suppose you thought drowning deaths were a large problem and you learned that
there was a strong correlation between ice cream sales and drowning deaths.
Recognizing that swimming and eating ice cream are simply both summertime
activities, one would of course be mistaken to conclude that banning ice cream
would reduce the number of drowning deaths. Likewise, as more companies start
using cloud services, we should certainly not be surprised that more
vulnerabilities affecting cloud services are discovered.</p>
<p>For the record, I certainly do not believe that “the cloud” is a panacea, but
that security is only meaningful relative to a threat model which may or may not
involve where hardware happens to be physically located.</p>
<h2 id="high-severity-vulnerability">“High Severity Vulnerability”</h2>
<p>Apparently, all that needs to happen for lots of time and energy to be wasted
and have a big fuss is to label something as “high severity.”</p>
<p>Consider this notice I saw when I logged on to GitHub one day:</p>
<p><img src="https://static.benburwell.com/blog/github-vuln-notice.png" alt="Screenshot of a GitHub alert which reads “We found a potential security
vulnerability in one of your
dependencies.”"></p>
<p>Clicking “See security alert” lead me to the following notice:</p>
<p><img src="https://static.benburwell.com/blog/github-vuln-detail.png" alt="Screenshot of a GitHub notice describing a high severity CVE issued for axios
and recommending to update from 0.18.0 to
0.19.0"></p>
<p>I looked up CVE-2019-10742 and quickly located the relevant pull request for
axios. To save you some clicks, axios is a JavaScript HTTP client library which
includes an API like this:</p>
<pre tabindex="0"><code>axios
  .get(&#39;http://example.com/evil.txt&#39;)
  .then(console.log)
  .catch(console.error);
</code></pre><p>Optionally, you can use the <code>get</code> API like this:</p>
<pre tabindex="0"><code>axios
  .get(&#39;http://example.com/evil.txt&#39;, { maxContentLength: 100 })
  .then(console.log)
  .catch(console.error);
</code></pre><p>in which case axios is expected to abort the response and reject the promise
after more than 100 bytes have been received. However, there was a bug in the
implementation where the promise would be rejected but reading from the stream
would continue, hence the CVE. But look at the code snippets above! <strong>This CVE
only applies to codebases which actually <em>use</em> the <code>maxContentLength</code> option!</strong>
If you weren’t using <code>maxContentLength</code>, you weren’t expecting any responses to
be truncated in the first place. Nonetheless, I found lots of comments like</p>
<blockquote>
<p>will need to roll out a fix for compliance asap</p>
</blockquote>
<blockquote>
<p>When will this issue be fixed? I have received tons of mail from github
regarding axios.</p>
</blockquote>
<blockquote>
<p>I can help work on it if needed, but we would need to get rid of axios
otherwise on an open source SDK I’m actively maintaining</p>
</blockquote>
<blockquote>
<p>we really need to get a fix out, especially seeing as we’re now getting Github
notifications on this.</p>
</blockquote>
<p>Thanks to the way GitHub shows references from other issues/pull requests, I was
also able to see how people were responding to the vulnerability alert within
their own code. Of the random sampling of projects with linked issues/PRs I
audited, none of them actually used the <code>maxContentLength</code> option, but dutifully
updated the version of their axios dependency and considered the issue resolved.</p>
<p>In reality, nothing about these projects’ security posture actually changed
though their maintainers may have <em>thought</em> they did. The real resolution for
many of these projects would be to first consider the impact if
<code>maxContentLength</code> was not set or respected, and if appropriate, update the
dependency <strong>and actually use <code>maxContentLength</code></strong>.</p>
<p>Of course, this is not the fault of the developers. Collectively, one of the
biggest things we tell people about protecting themselves from vulnerabilities
is to keep their software up to date. In this case, developers saw a helpful
message saying to update their dependencies, they updated them (possibly even
with the automatic click of a button!), and they <em>still</em> might have been
vulnerable.</p>
<h2 id="in-conclusion">In Conclusion</h2>
<p>Information security professionals need to be judicious about how and what is
communicated with or recommended to the public. As we’ve seen, “pop infosec” can
be ineffective or even harmful. And journalists need to ensure that their
reporting is consistent with evidence-based research.</p>
<p>I have said before that security is not a checklist, it is a mindset. You can’t
“be secure” by following some steps you find on line or by avoiding certain
technologies. The most effective way to improve your security posture is to hire
smart people to think critically about your environment.</p>

      ]]>
    </content>
    <author>
      <name>Ben Burwell</name>
    </author>
    <summary type="html">
      Information security is complicated. When you combine that with the fact that an increasing number of people seem to also consider it to be very important, the result is something I like to call “pop infosec.”
As in pop science or popular psychology, making information security accessible often involves simplifying concepts to improve their general palatability which results in laypeople overestimating their confidence. This “easiness effect” has been studied in the context of science communication, and likely applies to information security in a parallel sense.
    </summary>
  </entry>
  
  <entry>
    <title type="html">Vim vs Neovim on FreeBSD</title>
    <link href="https://www.benburwell.com/posts/vim-vs-neovim/" rel="alternate" type="text/html" title="Vim vs Neovim on FreeBSD" />
    <published>2019-07-22T00:00:00Z</published>
    <updated>2019-07-22T00:00:00Z</updated>
    <id>https://www.benburwell.com/posts/vim-vs-neovim/</id>
    <content type="html" xml:base="/posts/vim-vs-neovim/">
      <![CDATA[
        <p>I have a <a href="/freebsd.html">FreeBSD server</a> which primarily serves as a jail host.
As such, I’d like to keep its installed packages to a minimum. FreeBSD’s
default install comes with <code>vi</code>, but not <code>vim</code>. Using <code>vi</code> feels familiar
enough, but it becomes annoying not to have things like <code>gg</code> available. So I
decided to install vim to make my life a little nicer:</p>
<pre><code>$ sudo pkg install vim
Updating FreeBSD repository catalogue...
FreeBSD repository is up to date.
All repositories are up to date.
The following 103 package(s) will be affected (of 0 checked):

New packages to be INSTALLED:
        vim: 8.1.1439
        libXpm: 3.5.12_2
        libXext: 1.3.4,1
        libXau: 1.0.9
        libX11: 1.6.8,1
        libxcb: 1.13.1
        libXdmcp: 1.1.3
        xorgproto: 2019.1
        libxml2: 2.9.9
        libpthread-stubs: 0.4
        libXt: 1.2.0,1
        libSM: 1.2.3,1
        libICE: 1.0.9_3,1
        pango: 1.42.4_2
        libXrender: 0.9.10_2
        xorg-fonts-truetype: 7.7_1
        font-misc-meltho: 1.0.3_4
        mkfontscale: 1.2.1
        libfontenc: 1.1.4
        freetype2: 2.10.0
        fontconfig: 2.12.6,1
        font-misc-ethiopic: 1.0.3_4
        font-bh-ttf: 1.0.3_4
        encodings: 1.0.5,1
        font-util: 1.3.1
        dejavu: 2.37_1
        libXft: 2.3.2_3
        harfbuzz: 2.5.3
        graphite2: 1.3.13
        cairo: 1.16.0,2
        pixman: 0.34.0_1
        png: 1.6.37
        mesa-libs: 18.3.2_1
        libxshmfence: 1.3
        libXxf86vm: 1.1.4_3
        libXfixes: 5.0.3_2
        libXdamage: 1.1.5
        wayland: 1.16.0_1
        libepoll-shim: 0.0.20190311
        libdrm: 2.4.98_1,1
        libpciaccess: 0.14
        pciids: 20190620
        libunwind: 20170615
        glib: 2.56.3_5,1
        xkeyboard-config: 2.27
        libXrandr: 1.5.2
        libedit: 3.1.20190324,1
        libepoxy: 1.5.2
        fribidi: 0.19.7
        gtk3: 3.24.9
        libxkbcommon: 0.8.4
        libXinerama: 1.1.4_2,1
        libXi: 1.7.10,1
        libXcursor: 1.2.0
        libXcomposite: 0.4.5,1
        adwaita-icon-theme: 3.28.0
        gtk-update-icon-cache: 2.24.32
        shared-mime-info: 1.10_1
        hicolor-icon-theme: 0.17
        gdk-pixbuf2: 2.36.12
        tiff: 4.0.10_1
        jpeg-turbo: 2.0.2
        jbigkit: 2.1_1
        atk: 2.28.1
        cups: 2.2.11
        gnutls: 3.6.8
        trousers: 0.3.14_2
        tpm-emulator: 0.7.4_2
        gmp: 6.1.2_1
        p11-kit: 0.23.16.1
        libtasn1: 4.13_1
        nettle: 3.4.1_1
        libidn2: 2.2.0
        libunistring: 0.9.10_1
        libpaper: 1.1.24.4
        avahi-app: 0.7_2
        gnome_subr: 1.0
        libdaemon: 0.14_1
        gobject-introspection: 1.56.1,1
        dbus-glib: 0.110
        dbus: 1.12.12
        gdbm: 1.18.1_1
        wayland-protocols: 1.17
        librsvg2: 2.40.20
        libcroco: 0.6.12
        libgsf: 1.14.44
        colord: 1.3.5
        polkit: 0.114_2
        spidermonkey52: 52.9.0_3
        nspr: 4.21
        icu: 64.2,1
        sqlite3: 3.28.0
        desktop-file-utils: 0.23
        lcms2: 2.9
        argyllcms: 1.9.2_4
        libXScrnSaver: 1.2.3_2
        at-spi2-atk: 2.26.2
        at-spi2-core: 2.28.0
        libXtst: 1.2.3_2
        ruby: 2.5.5_2,1
        libyaml: 0.2.2
        ctags: 5.8
        cscope: 15.8b_1

Number of packages to be installed: 103

The process will require 517 MiB more space.
96 MiB to be downloaded.
</code></pre>
<p>Whoa, what?! Why do I need wayland and gtk for <em>vim</em>? <code>^C^C^C</code></p>
<pre><code>$ sudo pkg install neovim
]Updating FreeBSD repository catalogue...
FreeBSD repository is up to date.
All repositories are up to date.
The following 7 package(s) will be affected (of 0 checked):

New packages to be INSTALLED:
        neovim: 0.3.8
        luajit: 2.0.5_3
        unibilium: 2.0.0
        msgpack: 3.2.0
        libvterm: git20161218
        libuv: 1.30.1
        libtermkey: 0.22

Number of packages to be installed: 7

The process will require 28 MiB more space.
5 MiB to be downloaded.
</code></pre>
<p>Much more palatable.</p>

      ]]>
    </content>
    <author>
      <name>Ben Burwell</name>
    </author>
    <summary type="html">
      I have a FreeBSD server which primarily serves as a jail host. As such, I’d like to keep its installed packages to a minimum. FreeBSD’s default install comes with vi, but not vim. Using vi feels familiar enough, but it becomes annoying not to have things like gg available. So I decided to install vim to make my life a little nicer:
$ sudo pkg install vim Updating FreeBSD repository catalogue.
    </summary>
  </entry>
  
  <entry>
    <title type="html">FreeBSD Jail Networking Continued</title>
    <link href="https://www.benburwell.com/posts/freebsd-jail-networking-continued/" rel="alternate" type="text/html" title="FreeBSD Jail Networking Continued" />
    <published>2018-10-13T00:00:00Z</published>
    <updated>2018-10-13T00:00:00Z</updated>
    <id>https://www.benburwell.com/posts/freebsd-jail-networking-continued/</id>
    <content type="html" xml:base="/posts/freebsd-jail-networking-continued/">
      <![CDATA[
        <p>I decided to take another crack at the jail configuration I started in
<a href="/posts/freebsd-jails/">Experiment 1</a>. After reading bits and
pieces of a few random websites (including various ServerFault posts), on an
inkling I added the line <code>interface = &quot;bge0&quot;;</code> to my <code>/etc/jail.conf</code> file and
ran <code>service jail restart www</code> (<code>bge0</code> is my LAN interface on the host). After
<code>jexec</code>ing in, I tried <code>pkg install nginx</code> again and it worked like a charm!</p>
<p>I also noticed that when I run <code>ifconfig</code> on my host now, both the original
10.0.2.201 and the jail’s 10.0.2.202 addresses had been added to the <code>bge0</code>
interface. I wondered whether that meant that I could now SSH into the host
using the jail’s IP address. So on my laptop, I ran <code>ssh bb@10.0.2.202</code> and lo
and behold, it worked. The opposite, however, is <em>not</em> true: loading
<code>http://10.0.2.201</code> in a web browser does not give me the beautiful “welcome to
nginx” page that <code>http://10.0.2.202</code> has.</p>
<p>I’m sure some trickier stuff will arise when dealing with NAT and multiple
interfaces, but for now I’m satisfied that I have a basic understanding of how
to set up a service in a jail and expose it to the network.</p>

      ]]>
    </content>
    <author>
      <name>Ben Burwell</name>
    </author>
    <summary type="html">
      I decided to take another crack at the jail configuration I started in Experiment 1. After reading bits and pieces of a few random websites (including various ServerFault posts), on an inkling I added the line interface = &quot;bge0&quot;; to my /etc/jail.conf file and ran service jail restart www (bge0 is my LAN interface on the host). After jexecing in, I tried pkg install nginx again and it worked like a charm!
    </summary>
  </entry>
  
  <entry>
    <title type="html">How does DHCP work?</title>
    <link href="https://www.benburwell.com/posts/how-does-dhcp-work/" rel="alternate" type="text/html" title="How does DHCP work?" />
    <published>2018-10-09T00:00:00Z</published>
    <updated>2018-10-09T00:00:00Z</updated>
    <id>https://www.benburwell.com/posts/how-does-dhcp-work/</id>
    <content type="html" xml:base="/posts/how-does-dhcp-work/">
      <![CDATA[
        <p>DHCP (Dynamic Host Configuration Protocol) is an integral part of most networks,
from small home network to campuses serving thousands of devices. I recently
realized that I didn’t have a solid understanding of how it functions. I knew
that DHCP was used to obtain an IP address from a central server when joining a
network, but wasn’t clear on how that negotiation takes place. How could a
machine without an IP address talk to a server that it didn’t know the address
of?</p>
<p>To learn more, I started a Wireshark capture and then connected my computer to a
network to see what happened. I immediately discovered that DHCP is part of the
Bootstrap Protocol (also known as <code>BOOTP</code>), which is transported over UDP/IP.
DHCP servers read and write on port 67, while DHCP clients read and write on
port 68. Before the client has acquired an IP address, it uses <code>0.0.0.0</code> as the
source address for packets it transmits, and addresses its packets to the
broadcast address <code>255.255.255.255</code>.</p>
<p>For the simple case that I examined, I found that there are four messages
involved in acquiring an IP address: Discover, Offer, Request, and ACK. At a
high level, the client broadcasts a request for an address, a DHCP server
responds with an offer, the client makes a request based on the offer it
received, and finally the DHCP server acknowledges the request.</p>
<h2 id="step-1-discovery">Step 1: Discovery</h2>
<p>The client sends a UDP broadcast packet from <code>0.0.0.0:68</code> to
<code>255.255.255.255:67</code>. This is a BOOTP Discover message that includes details
about what information is being requested from the network’s authoritative DHCP
server. In the case I observed, the following items were requested:</p>
<ul>
<li>Subnet mask</li>
<li>Classless static route</li>
<li>Router</li>
<li>DNS server</li>
<li>Domain name</li>
<li>Proxy autodiscovery</li>
<li>LDAP server</li>
<li>NetBIOS Name Server</li>
<li>NetBIOS Node Type</li>
</ul>
<p>A DHCP lease time of 90 days was requested, and my DHCP client identifier (MAC
address) and hostname were also included.</p>
<p>In the case I observed, after the first discovery packet that was transmitted
was not responded to with an offer after 1.125 seconds, a second discovery
packet was transmitted. Since UDP does not guarantee delivery, it makes sense
that a basic replay mechanism would be part of the protocol to handle dropped
packets. While TCP uses a sequence number to correctly order packets, BOOTP
appears to use a somewhat surprising metric: its header contains a “seconds
elapsed” field which was set to 0 for the first discovery packet and 1 for the
packet 1.125 seconds later.</p>
<h2 id="step-2-offer">Step 2: Offer</h2>
<p>The server sends UDP packet from <code>192.168.1.1:67</code> to <code>192.168.1.2:68</code> containing
a DHCP Offer message. There are a few ways we can tell this offer is for us:</p>
<ul>
<li>The BOOTP Transaction ID field is set to the value that we sent in our
Discover packet</li>
<li>The Client MAC address field in the BOOTP message is set to ours</li>
<li>At the Ethernet layer, the destination address is also set to our MAC address</li>
</ul>
<p>In this offer message, we get the responses to some of the questions we asked in
our Discover packet. In this case, we are offered a lease time of <code>3600</code> (one
hour, much less than our requested 90 days). We are instructed to renew after 30
minutes, rebind after 52 minutes 30 seconds, and given a netmask of
<code>255.255.255.0</code>. We’re also informed of the router/DNS server’s address of
<code>192.168.1.1</code> and supplied with the domain name <code>home</code> (so our machine’s “FQDN”
will be <code>&lt;hostname&gt;.home</code>).</p>
<p>To figure out the address we have been offered, we can look at either the IP
address that the packet was sent to, or we can examine the “Your IP” field in
the BOOTP message.</p>
<h2 id="step-3-request">Step 3: Request</h2>
<p>Now that we’ve received an offer, we make a request for the offer. This mostly
involves reiterating the initial request, again sent from <code>0.0.0.0:68</code> to
<code>255.255.255.255:67</code>. Additionally, the message includes a “Requested IP” field
that specifies the IP address from the Offer.</p>
<h2 id="step-4-acknowledgement">Step 4: Acknowledgement</h2>
<p>Finally, the DHCP server acknowledges our request. This completes the process of
IP address acquisition. The server reiterates the correct parameters it provided
in the Offer, including the rebinding and renewal periods, netmask, etc.</p>
<hr>
<p>Some observations: it makes sense to see UDP used for this protocol rather than
TCP since TCP is connection-oriented and we don’t know the address of the server
(nor our own address for that matter) at the beginning of this process. It’s
also easy to imagine havoc being wreaked on a network by creating a rogue DHCP
server that provides fake leases with conflicting IP addresses.</p>
<p>Armed with my basic knowledge of how DHCP functions, I wanted to better
understand some of what I had encountered while experimenting. For instance,
what is the difference between “rebinding” and “renewal”? What is the reason for
using “seconds elapsed” as a kind of sequence number? My next stop to find
answers was the IETF RFCs.</p>
<p>As of this writing, there have been three iterations of the DHCP RFC, along with
a few other extension/option RFCs. All three were written by Ralph Droms of
Bucknell University. The first two (<a href="https://tools.ietf.org/html/rfc1531">RFC 1531</a> and <a href="https://tools.ietf.org/html/rfc1541">RFC 1541</a>)
were published in October 1993, and the latest version, <a href="https://tools.ietf.org/html/rfc2131">RFC 2131</a>, was
published in March 1997. For historical context, I wanted to learn what had
changed throughout the versions, so I ran <code>$ diff rfc1531.txt rfc1541.txt</code> (this
is one of those times that I love having the <a href="https://www.rfc-editor.org/retrieve/rsync/">RFC repository available
locally</a>. There don’t seem to be any
protocol changes between RFC 1531 and RFC 1541, just a few formatting and
phrasing changes. Running <code>diff</code> on RFC 1531 and RFC 2131 produced quite a large
output that I was not eager to read through, but conveniently, section 1.1 of
RFC 2131 is called “Changes to RFC 1541”. The 1997 changes are described as:</p>
<blockquote>
<p>This document updates the DHCP protocol specification that appears in RFC1541.
A new DHCP message type, DHCPINFORM, has been added; see section 3.4, 4.3 and
4.4 for details. The classing mechanism for identifying DHCP clients to DHCP
servers has been extended to include &quot;vendor&quot; classes as defined in sections
4.2 and 4.3. The minimum lease time restriction has been removed. Finally,
many editorial changes have been made to clarify the text as a result of
experience gained in DHCP interoperability tests.</p>
</blockquote>
<p>Interestingly, the terms we’re used to seeing defined in <a href="https://tools.ietf.org/html/rfc2119">RFC 2119</a>
(MUST, MUST NOT, REQUIRED, etc) are specifically defined in the document. On
closer inspection, RFC 2119 was <em>also</em> published in March 1997!</p>
<p>With regard to my lingering questions, I learned that “renewing” is when a
client is attempting to renew its lease by recontacting the server that
initially granted it. If the server can’t be contacted, or refuses to renew the
lease, the client enters the “rebinding” state in which it tries to contact any
DHCP server to renew its lease or obtain a new one.</p>
<p>I was only able to find one mention of an actual use for the “seconds” field (on
page 15):</p>
<blockquote>
<p>To help ensure that any BOOTP relay agents forward the DHCPREQUEST message to
the same set of DHCP servers that received the original DHCPDISCOVER message,
the DHCPREQUEST message MUST use the same value in the DHCP message header's
'secs' field and be sent to the same IP broadcast address as the original
DHCPDISCOVER message.</p>
</blockquote>
<p>I did notice that there are a lot of sections with language like “a DHCP server
MAY extend a client’s lease <strong>only if it has local administrative authority</strong> to
do so’ (emphasis added). But what if someone were to put a rogue DHCP server on
the network, one that did <em>not</em> have “local administrative authority”? It’s
probably quite possible to wreak a bit of havoc by creating a rogue DHCP server,
though perhaps not quite as easy as it might seem. Since DHCP leases often last
for some time (hours or days), existing clients might not be affected by the
appearance of a new server for quite a while. Besides, due to the binding
mechanism, when a client needs to renew its lease, it sends a unicast message
directly to the server it initially obtained the lease from rather than
immediately resorting to broadcasting a DHCPDISCOVER message.</p>
<p>Since DHCP is often employed on a contiguous physical network segment, it may
not always be possible to use a firewall to block traffic to the server port
(67). This would require some sort of Layer 2 firewall, which I’m sure exists,
but doesn’t seem to be widely deployed (or recommended). It would of course be
possible to set up rules on a Layer 3/4 firewall to block traffic to port 67 on
machines not authorized to act as DHCP servers to prevent a rogue server from
having any effect outside its physical segment.</p>
<p>In conclusion:</p>
<ul>
<li>Wireshark is a great learning tool</li>
<li>RFCs are educational from a technical as well as a historical perspective</li>
<li>Now I know how DHCP works in a bit more depth</li>
</ul>

      ]]>
    </content>
    <author>
      <name>Ben Burwell</name>
    </author>
    <summary type="html">
      DHCP (Dynamic Host Configuration Protocol) is an integral part of most networks, from small home network to campuses serving thousands of devices. I recently realized that I didn’t have a solid understanding of how it functions. I knew that DHCP was used to obtain an IP address from a central server when joining a network, but wasn’t clear on how that negotiation takes place. How could a machine without an IP address talk to a server that it didn’t know the address of?
    </summary>
  </entry>
  
  <entry>
    <title type="html">FreeBSD Experiment 1: Jails</title>
    <link href="https://www.benburwell.com/posts/freebsd-jails/" rel="alternate" type="text/html" title="FreeBSD Experiment 1: Jails" />
    <published>2018-09-20T00:00:00Z</published>
    <updated>2018-09-20T00:00:00Z</updated>
    <id>https://www.benburwell.com/posts/freebsd-jails/</id>
    <content type="html" xml:base="/posts/freebsd-jails/">
      <![CDATA[
        <p>In my preparations for removing ESXi, I tried creating a simple jail on my test
box <code>helios</code>. As part of my purpose is to learn as much as possible, I decided
against using a tool like <code>ezjail</code> in favor of doing it “by hand.” While the
FreeBSD Handbook has some information on creating jails without using additional
tools, pretty much every other document I found suggested using ezjail. There’s
a chance I’ll revisit ezjail in the future, as it seems to have some helpful
features like having a “base jail” so you only need one copy of the FreeBSD base
system, but for now I’d like to do as much as possible without additional tools.</p>
<p>My goal for this experiment was to set up a simple web server (nginx) inside a
jail. To start, I edited <code>/etc/jail.conf</code> to contain the following:</p>
<pre tabindex="0"><code>www {
  host.hostname = www.local;
  ip4.addr = 10.0.2.202;
  path = &#34;/usr/jail/www&#34;;
  exec.start = &#34;/bin/sh /etc/rc&#34;;
  exec.stop = &#34;/bin/sh /etc/rc.shutdown&#34;;
}
</code></pre><p>Next, I used <code>bsdinstall(8)</code> to install the base system instead of compiling
from source:</p>
<pre tabindex="0"><code>root@helios:~ # bsdinstall jail /usr/jail/www
</code></pre><p>I then added <code>jail_enable=&quot;YES&quot;</code> to <code>/etc/rc.conf</code> and started the jail:</p>
<pre tabindex="0"><code>root@helios:~ # service jail start www
</code></pre><p>This took a few seconds to complete, and then the jail showed up when I ran
<code>jls</code>:</p>
<pre tabindex="0"><code>root@helios:~ # jls
   JID  IP Address      Hostname                      Path
     1  10.0.2.202      www.local                     /usr/jail/www
</code></pre><p>I was able to enter the jail:</p>
<pre tabindex="0"><code>root@helios:~ # jexec www /bin/sh
#
</code></pre><p>But I seem not to have Internet connectivity, as attempting to use <code>pkg-ng</code>
fails:</p>
<pre tabindex="0"><code># pkg install nginx
The package management tool is not yet installed on your system.
Do you want to fetch and install it now? [y/N]: y
Bootstrapping pkg from pkg+http://pkg.FreeBSD.org/FreeBSD:11:amd64/quarterly, please wait...
pkg: Error fetching http://pkg.FreeBSD.org/FreeBSD:11:amd64/quarterly/Latest/pkg.txz: Non-recoverable resolver failure
A pre-built version of pkg could not be found for your system.
Consider changing PACKAGESITE or installing it from ports: &#39;ports-mgmt/pkg&#39;.
</code></pre><p>Running <code>ifconfig</code> inside the jail shows that I do not seem to have an IP
address, nor can I seem to communicate with any hosts. Interestingly when I
attempt to ping my gateway, I get the message:</p>
<pre tabindex="0"><code>ping: ssend socket: Operation not permitted
</code></pre><p>Clearly there’s something I’ve not yet figured out.</p>

      ]]>
    </content>
    <author>
      <name>Ben Burwell</name>
    </author>
    <summary type="html">
      In my preparations for removing ESXi, I tried creating a simple jail on my test box helios. As part of my purpose is to learn as much as possible, I decided against using a tool like ezjail in favor of doing it “by hand.” While the FreeBSD Handbook has some information on creating jails without using additional tools, pretty much every other document I found suggested using ezjail. There’s a chance I’ll revisit ezjail in the future, as it seems to have some helpful features like having a “base jail” so you only need one copy of the FreeBSD base system, but for now I’d like to do as much as possible without additional tools.
    </summary>
  </entry>
  
  <entry>
    <title type="html">Notes on setting up a FreeBSD home server</title>
    <link href="https://www.benburwell.com/posts/freebsd-prologue/" rel="alternate" type="text/html" title="Notes on setting up a FreeBSD home server" />
    <published>2018-09-17T00:00:00Z</published>
    <updated>2018-09-17T00:00:00Z</updated>
    <id>https://www.benburwell.com/posts/freebsd-prologue/</id>
    <content type="html" xml:base="/posts/freebsd-prologue/">
      <![CDATA[
        <p>A few months ago, I purchased a beefy second-hand tower to act as a home server.
I was looking to bring some of the services that I was previously outsourcing
into a single location, and to expand my familiarity with networking and systems
administration. Specifically, I wanted to:</p>
<ul>
<li>Replace the small DigitalOcean box that I was using as a VPN/proxy when I
needed to use public WiFi</li>
<li>Stop paying for a GitHub subscription to host private repositories</li>
<li>Have a better home media and file sharing/backup solution</li>
<li>Host a Minecraft server (nothing too serious, I occasionally play with a few
friends)</li>
<li>Have a stable home for various VMs that I spin up as part of my security lab
(I’ve been playing around with pen testing and trying to learn more about
Windows as a part of this).</li>
</ul>
<p>My initial solution was to install a free version of VMWare ESXi as a hypervisor
and create several virtual machines. It was actually quite easy to get ESXi up
and running and start creating VMs. For the past several months, my home network
has been completely routed through the server (it has dual Ethernet, so I’m
using pfSense in a VM as my firewall/NAT/DHCP/etc), and I’ve spun up several VMs
(mostly Ubuntu) for things like Gitlab and Minecraft.</p>
<p>However, there are a few things that I don’t quite like. I did have an incident
following a power outage after my free trial of ESXi had expired but before I
inputted my free license key in the UI. This resulted in my pfSense VM not
auto-booting and due to some poor configuration on my part, I was unable to
access the ESXi web UI to enter the license key without resetting the network
settings through the ESXi console. This brings me to my second gripe: the ESXi
web UI is <em>very</em> buggy and overall pretty awful to use. Certain pages have to be
reloaded to work properly, dialogs are randomly empty, etc. Thirdly, I’ve found
myself creating a “general purpose” VM that I can SSH into remotely. While
there’s nothing explicitly <em>wrong</em> with this, it just doesn’t feel quite right
to me to have a general purpose server that is completely parallel to my other
server VMs.</p>
<p>As a result of these shortcomings and learnings, I have decided to embark upon a
journey towards further simplification and reliability. I’ll be replacing ESXi
with FreeBSD, a rock-solid operating system. Rather than running a utility VM,
I’ll simply have the FreeBSD system on the server itself as a “base of
operations.”</p>
<p>I plan to learn more about and use several tools during this process. Currently,
I only have one 2 TB drive installed. I plan to add a second one and use zfs to
create a mirrored vdev pool for redundancy. This will make me feel a lot better
about using my server as a backup destination. Of course, this in itself is not
a complete backup solution, but it’s a significant step forward from just
relying on a single disk. Rather than running pfSense in a VM, I plan to just
use the ISC DHCP server from the ports collection and use the built-in <code>pf</code>
firewall to accomplish just about everything I was using pfSense for. I’ll
likely also end up running a BIND DNS server for a few local network things.</p>
<p>I am still learning about jails in FreeBSD, but I think they could replace a few
of the VMs I have currently, such as the Minecraft and GitLab servers. I plan to
use bhyve to run things like Windows VMs for pen testing that jails are clearly
not suited for.</p>
<p>I’ve used FreeBSD as my desktop OS in the past, and really love how it feels
compared with GNU/Linux. Everything just seems more straightforward, and I was
surprised to find that things like graphics drivers Just Work™ under
FreeBSD where they require a lot of ugly finagling under Linux. I’m quite
looking forward to using FreeBSD more often frequently, and gaining more depth
in some of its great tools like jails and pf.</p>
<p>To start making the transition (which might be a little painful), I’ve installed
a fresh copy of FreeBSD 11.2 on a currently-unused machine to start poking
around with zfs configurations, jails, and bhyve. This will give me the
foundation I need to effectively set up my top-level environment and hopefully
get it mostly right the first time. Incidentally, I’m also about half way
through reading <a href="https://nostarch.com/pf3">The Book of PF</a> from No Starch Press,
which will no doubt be helpful in my transition from pfSense to pure pf.</p>
<p>I intend to update this page with notes as I continue on my FreeBSD journey.
Stay tuned!</p>

      ]]>
    </content>
    <author>
      <name>Ben Burwell</name>
    </author>
    <summary type="html">
      A few months ago, I purchased a beefy second-hand tower to act as a home server. I was looking to bring some of the services that I was previously outsourcing into a single location, and to expand my familiarity with networking and systems administration. Specifically, I wanted to:
Replace the small DigitalOcean box that I was using as a VPN/proxy when I needed to use public WiFi Stop paying for a GitHub subscription to host private repositories Have a better home media and file sharing/backup solution Host a Minecraft server (nothing too serious, I occasionally play with a few friends) Have a stable home for various VMs that I spin up as part of my security lab (I’ve been playing around with pen testing and trying to learn more about Windows as a part of this).
    </summary>
  </entry>
  
  <entry>
    <title type="html">Whitelisting Tor on CloudFlare</title>
    <link href="https://www.benburwell.com/posts/whitelisting-tor-on-cloudflare/" rel="alternate" type="text/html" title="Whitelisting Tor on CloudFlare" />
    <published>2016-04-08T00:00:00Z</published>
    <updated>2016-04-08T00:00:00Z</updated>
    <id>https://www.benburwell.com/posts/whitelisting-tor-on-cloudflare/</id>
    <content type="html" xml:base="/posts/whitelisting-tor-on-cloudflare/">
      <![CDATA[
        <p>On March 30th, 2016, CloudFlare posted <a href="https://blog.cloudflare.com/the-trouble-with-tor/">a blog entry entitled “The Trouble with
Tor”</a> outlining the issues
Cloudflare has with serving clients’ sites to Tor users. The Tor project quickly
followed it up with <a href="https://blog.torproject.org/blog/trouble-cloudflare">their own post, “The Trouble with
CloudFlare”</a>, which
presented an analysis of the situation from Tor’s perspective.</p>
<p>CloudFlare’s post acknowledged that Tor does play an important role on the
internet, but presents the irrelevant conclusion that of “Security, Anonymity,
Convenience: Pick Any Two,” security and convenience will necessarily be the
choices of their customers. Certainly, all three properties are important, but
not all of their customers’ sites will be subject to the same risks.</p>
<p>I use CloudFlare’s services on several sites, including this one. On some of my
sites, I do rely on CloudFlare to provide some measure of security, particularly
ones with dynamic content. However, for a site like this one that is entirely
static, I have nothing to gain from hiding my content due to a perceived
security threat. Everything on this site is considered public, and there are no
attack vectors that are prevented through CloudFlare doing browser verification.</p>
<p>On the other hand, anonymity is quite important to me. Where it does not present
a security risk to disable CloudFlare’s browser verification, I have chosen to
whitelist Tor users on this site. There is little to be lost from bots or
spammers accessing this site at will, and there is much to be gained from
ensuring that people who consider their privacy important to be able to access
content without undue hinderance.</p>
<p>CloudFlare does provide an easy way to whitelist all Tor traffic, and they even
presented it in their original blog post. To whitelist Tor, go to the Firewall
app in your CloudFlare dashboard and add an Access Rule. Enter <code>T1</code> as the
country code (the special code for Tor), and select Whitelist as the action.
Now, Tor users will not be presented with a CAPTCHA when visiting your site.</p>
<p>To see it in action for yourself, <a href="https://www.torproject.org/projects/torbrowser.html.en">download the Tor
browser</a> and try
visiting your site before and after adding the firewall rule. More information
about how CloudFlare handles Tor traffic can be found <a href="https://support.cloudflare.com/hc/en-us/articles/203306930-Does-CloudFlare-block-Tor-">on their Help Center
page</a>.</p>
<p>While whitelisting Tor is not the right solution for every site, I encourage you
to consider whether yours is a good candidate. Let me know your thoughts!</p>

      ]]>
    </content>
    <author>
      <name>Ben Burwell</name>
    </author>
    <summary type="html">
      On March 30th, 2016, CloudFlare posted a blog entry entitled “The Trouble with Tor” outlining the issues Cloudflare has with serving clients’ sites to Tor users. The Tor project quickly followed it up with their own post, “The Trouble with CloudFlare”, which presented an analysis of the situation from Tor’s perspective.
CloudFlare’s post acknowledged that Tor does play an important role on the internet, but presents the irrelevant conclusion that of “Security, Anonymity, Convenience: Pick Any Two,” security and convenience will necessarily be the choices of their customers.
    </summary>
  </entry>
  
  <entry>
    <title type="html">Getting Login to Work on Ubuntu 15.04 with NVIDIA Drivers</title>
    <link href="https://www.benburwell.com/posts/getting-login-to-work-ubuntu-15.04-nvidia/" rel="alternate" type="text/html" title="Getting Login to Work on Ubuntu 15.04 with NVIDIA Drivers" />
    <published>2015-04-23T00:00:00Z</published>
    <updated>2015-04-23T00:00:00Z</updated>
    <id>https://www.benburwell.com/posts/getting-login-to-work-ubuntu-15.04-nvidia/</id>
    <content type="html" xml:base="/posts/getting-login-to-work-ubuntu-15.04-nvidia/">
      <![CDATA[
        <p>When I upgraded to Ubuntu 15.04, I was unable to log in. The machine started
normally and I was presented with the login window. But when I entered my
password, the screen went black for a few moments and then the login screen came
back.</p>
<p>Since I’m using an <a href="http://www.geforce.com/hardware/desktop-gpus/geforce-gtx-750">NVIDIA GeForce GTX
750</a>, which
Ubuntu’s Nouveau drivers don’t support, I previously needed to install the
NVIDIA graphics drivers.</p>
<p>By entering <kbd>Ctrl</kbd> + <kbd>Alt</kbd> + <kbd>F3</kbd>, I was able to drop
to a shell. When I checked <code>/var/log/Xorg.0.log</code>, I found a message stating that
the NVIDIA driver had failed to load the GLX module, despite earlier messages
that it had been loaded. The message also recommended reinstalling the NVIDIA
driver.</p>
<p>In the same shell, I ran:</p>
<pre tabindex="0"><code>wget http://us.download.nvidia.com/XFree86/Linux-x86_64/349.16/NVIDIA-Linux-x86_64-349.16.run
chmod u+x NVIDIA-Linux-x86_64-349.16.run
sudo service lightdm stop
sudo ./NVIDIA-Linux-x86_64-349.16.run
</code></pre><p>After that, restarting my computer cleared up the issue.</p>

      ]]>
    </content>
    <author>
      <name>Ben Burwell</name>
    </author>
    <summary type="html">
      When I upgraded to Ubuntu 15.04, I was unable to log in. The machine started normally and I was presented with the login window. But when I entered my password, the screen went black for a few moments and then the login screen came back.
Since I’m using an NVIDIA GeForce GTX 750, which Ubuntu’s Nouveau drivers don’t support, I previously needed to install the NVIDIA graphics drivers.
By entering Ctrl + Alt + F3, I was able to drop to a shell.
    </summary>
  </entry>
  
  <entry>
    <title type="html">How to Reset a Lost Password on a LUKS-Encrypted Disk in Ubuntu Linux</title>
    <link href="https://www.benburwell.com/posts/reset-forgotten-password-on-luks-encrypted-ubuntu/" rel="alternate" type="text/html" title="How to Reset a Lost Password on a LUKS-Encrypted Disk in Ubuntu Linux" />
    <published>2015-03-28T00:00:00Z</published>
    <updated>2015-03-28T00:00:00Z</updated>
    <id>https://www.benburwell.com/posts/reset-forgotten-password-on-luks-encrypted-ubuntu/</id>
    <content type="html" xml:base="/posts/reset-forgotten-password-on-luks-encrypted-ubuntu/">
      <![CDATA[
        <p>Here’s the situation I recently found myself in:</p>
<ul>
<li>Ubuntu Linux 14.10</li>
<li>Unknown password for user account</li>
<li>Unknown (but set) root password (Ubuntu’s philosophy is to use <code>sudo</code> for everything)</li>
<li>LUKS encrypted filesystem (known passphrase)</li>
<li>Physical access to the computer</li>
</ul>
<p>I needed to reset my account password. Normally, with physical access to a
machine, all bets are off when it comes to security. I tried booting up the
machine into <a href="https://wiki.ubuntu.com/RecoveryMode">recovery mode</a> by holding
down <kbd>shift</kbd> as soon as the BIOS had finished loading. But when I
selected the “Drop to root shell” option, I was prompted to enter the unknown
root password.</p>
<p>My second approach was to boot into single user mode by editing the GRUB command
script.</p>
<p><img src="https://static.benburwell.com/blog/ubuntu-grub.png" alt="Ubuntu’s GRUB menu"></p>
<p>By going down to the recovery mode option and hitting <kbd>e</kbd>, you can edit
the GRUB commands. By adding <code>init=/bin/bash</code> at the end of the line
beginning with <code>linux</code> that specifies the boot image, you can specify
an initial shell to use. Then I hit <kbd>F10</kbd> to boot.</p>
<p>After waiting for about 30 seconds or a minute, I saw a message that waiting for
the root device (the locked disk) had timed out. I was then dumped into an
<a href="https://wiki.ubuntu.com/Initramfs">initramfs</a> shell. From there, I was able to
unlock the disk by running <code>cryptsetup luksOpen /dev/sda3 sda3_crypt</code>.</p>
<p>Next, I mounted the freshly-unlocked disk with <code>mount -o rw /dev/sda3 /root</code>,
taking advantage of the pre-existing empty directory. From there, I used
<code>chroot</code> to run <code>passwd</code> in the OS.</p>
<pre tabindex="0"><code>$ chroot /root passwd
$ chroot /root passwd myUserName
</code></pre><p>By running these commands, I successfully reset both the root password as well
as the password for my account. From there, I was able to restart the machine
and boot normally.</p>

      ]]>
    </content>
    <author>
      <name>Ben Burwell</name>
    </author>
    <summary type="html">
      Here’s the situation I recently found myself in:
Ubuntu Linux 14.10 Unknown password for user account Unknown (but set) root password (Ubuntu’s philosophy is to use sudo for everything) LUKS encrypted filesystem (known passphrase) Physical access to the computer I needed to reset my account password. Normally, with physical access to a machine, all bets are off when it comes to security. I tried booting up the machine into recovery mode by holding down shift as soon as the BIOS had finished loading.
    </summary>
  </entry>
  
  <entry>
    <title type="html">Your Website is not Special, Don’t Make Visitors Make Accounts</title>
    <link href="https://www.benburwell.com/posts/your-website-is-not-special-dont-make-visitors-make-accounts/" rel="alternate" type="text/html" title="Your Website is not Special, Don’t Make Visitors Make Accounts" />
    <published>2015-01-16T00:00:00Z</published>
    <updated>2015-01-16T00:00:00Z</updated>
    <id>https://www.benburwell.com/posts/your-website-is-not-special-dont-make-visitors-make-accounts/</id>
    <content type="html" xml:base="/posts/your-website-is-not-special-dont-make-visitors-make-accounts/">
      <![CDATA[
        <p>One of my pet peeves in website usability design is forcing people to create
unnecessary accounts. My recent purchase of some concert tickets from Ticketfly
required me to make an account to buy them. For people who buy a lot of concert
tickets, having an account may make a lot of sense. But for me, as someone who
buys concert tickets at most once every year or two, having an account on a site
that I will probably only use once is not only unnecessary, it’s annoying.</p>
<p>This is not to say that you shouldn’t offer accounts; that would be ridiculous
(depending on the type of site you are running, of course). However, in general,
your users know far better than you do whether or not they actually want or will
use an account. Forcing them to create an account will only drive them away.
People don’t like creating accounts they don’t want to have. There’s really no
reason you can’t have a “check out as guest” option.</p>
<p>And if you do offer accounts, here are a couple of rules to follow to ensure a
good user experience:</p>
<ol>
<li>Allow the option of using a 3rd-party identity provider (OpenID, Facebook,
Google, etc.). Often, visitors don’t want to have yet another
username/password to remember.</li>
<li>Don’t force visitors to use a 3rd-party provider. Always have a local option.
As a counter point to (1), many visitors won’t want to use their
Facebook/Google accounts for authenticating to other sites.</li>
<li>Username = Email. Don’t make people remember a username for your site. You
may allow them to pick a username later on that can be used in lieu of their
email address, e.g. as the URL for a profile page, but don’t force them to
use a username to log in.</li>
<li>Don’t make complicated password rules. If you do have password requirements,
show them to the user <em>before</em> they try to make a password. Only telling them
when their password doesn’t fit your requirements causes consternation.</li>
<li>Never <em>ever</em> limit how long a password can be (within reason, obviously you
don’t want to be receiving a megabyte long password). My bank limits
passwords to 14 characters, which is rather absurd. Since you’re hashing your
passwords anyway, it’s not like you need to allocate extra memory in your
tables to store longer passwords.</li>
<li>Always allow your users to close their account. This should remove all
information about them from your service to the extent possible without
disrupting the integrity of other information.</li>
</ol>
<p>Of course, there are technical details that you need to be watching out for that
are outside the scope of this post. I’ll leave it to you to make sure your
implementation is secure and robust, but I’ll leave you with a few general tips:</p>
<ul>
<li>Don’t invent your own crypto. This applies to protocols, hashing, encryption,
everything.</li>
<li><a href="https://codahale.com/how-to-safely-store-a-password/">Use bcrypt</a>.</li>
<li>Using unsecured HTTP (no SSL/TLS) is inexcusable.</li>
<li>Don’t invent your own crypto.</li>
<li><em>Don’t invent your own crypto.</em></li>
<li><strong><a href="https://codahale.com/how-to-safely-store-a-password/">Use bcrypt</a>.</strong></li>
</ul>

      ]]>
    </content>
    <author>
      <name>Ben Burwell</name>
    </author>
    <summary type="html">
      One of my pet peeves in website usability design is forcing people to create unnecessary accounts. My recent purchase of some concert tickets from Ticketfly required me to make an account to buy them. For people who buy a lot of concert tickets, having an account may make a lot of sense. But for me, as someone who buys concert tickets at most once every year or two, having an account on a site that I will probably only use once is not only unnecessary, it’s annoying.
    </summary>
  </entry>
  
  <entry>
    <title type="html">Using Showoff for Markdown Presentations</title>
    <link href="https://www.benburwell.com/posts/showoff/" rel="alternate" type="text/html" title="Using Showoff for Markdown Presentations" />
    <published>2014-12-14T00:00:00Z</published>
    <updated>2014-12-14T00:00:00Z</updated>
    <id>https://www.benburwell.com/posts/showoff/</id>
    <content type="html" xml:base="/posts/showoff/">
      <![CDATA[
        <p>Recently, I had to give a presentation and decided to do some research on using
Markdown. By coincidence, I had also been looking into
<a href="https://puppetlabs.com">Puppet</a>, a flexible and powerful configuration manager,
when I stumbled across <a href="https://github.com/puppetlabs/showoff">Showoff</a>, another
Puppet Labs project.</p>
<p>Showoff is a Ruby application that takes a Markdown file with some <a href="https://github.com/puppetlabs/showoff/blob/master/documentation/AUTHORING.rdoc">special
formatting</a>
and transforms it into a web-accessible slideshow. As expected, you can open up
a presenter view in your browser. You can also easily open up a second window to
use on your projector in full screen. You can even give your audience the
address for the server so they can follow along on their own screens.</p>
<p>There are also some nice audience interactivity features, like the ability to
ask questions through the web interface. These questions will be shown on the
presenter’s screen. Audience members also have the ability to indicate whether
the presenter is moving too quickly or too slowly so that an adjustment can be
made accordingly.</p>
<p>Finally, Showoff is designed with software presentations in mind, with the
ability to dynamically run Ruby, JavaScript, or Coffeescript code included in
your slides. You can attach other files or labs to your slides, so audience
members following along on their own devices can easily access reference
materials at the appropriate time.</p>
<p>For a small presentation like the one I was doing, a lot of the more advanced
features of Showoff would have been overkill, but it still made an awesome
presentation method. It was also really neat to be able to say that the slides
were available on Github if anyone wanted to look at them afterwards.</p>

      ]]>
    </content>
    <author>
      <name>Ben Burwell</name>
    </author>
    <summary type="html">
      Recently, I had to give a presentation and decided to do some research on using Markdown. By coincidence, I had also been looking into Puppet, a flexible and powerful configuration manager, when I stumbled across Showoff, another Puppet Labs project.
Showoff is a Ruby application that takes a Markdown file with some special formatting and transforms it into a web-accessible slideshow. As expected, you can open up a presenter view in your browser.
    </summary>
  </entry>
  
  <entry>
    <title type="html">Configuring CloudFlare’s Universal SSL</title>
    <link href="https://www.benburwell.com/posts/configuring-cloudflare-universal-ssl/" rel="alternate" type="text/html" title="Configuring CloudFlare’s Universal SSL" />
    <published>2014-10-11T00:00:00Z</published>
    <updated>2015-05-22T00:00:00Z</updated>
    <id>https://www.benburwell.com/posts/configuring-cloudflare-universal-ssl/</id>
    <content type="html" xml:base="/posts/configuring-cloudflare-universal-ssl/">
      <![CDATA[
        <p>On September 29, 2014, <a href="https://www.cloudflare.com/">CloudFlare</a>, a web security
company and CDN provider,
<a href="http://blog.cloudflare.com/introducing-universal-ssl/">announced</a> that they
would begin offering free, automatic SSL to all its customers (including those
on their free plan). This is an enormous step forward for enhancing security and
privacy on the Internet; while website owners would previously need to purchase
an SSL certificate for their site and often pay extra for SSL hosting,
CloudFlare now makes this all free. Plus, you get the benefits of their other
services such as DDoS protection.</p>
<p>I’ve previously written about <a href="https://www.benburwell.com/writing/migrating-to-github-pages-and-jekyll/">hosting static sites with GitHub
Pages</a>,
which is what I use for <a href="https://www.benburwell.com">www.benburwell.com</a>. GitHub provides SSL hosting for its
static sites, but not with custom domain names (e.g. <code>https://example.github.io</code>
but <code>http://example.com</code>). Using CloudFlare, it’s possible to use
<code>https://example.com</code> for free. And as a bonus, you won’t need to worry about
DNS hosting either.</p>
<h2 id="what-is-cloudflare">What is CloudFlare?</h2>
<p>CloudFlare works by having all of the traffic for your site routed through
CloudFlare’s network, which provides CDN services such as caching of static
resources, as well as security options like DDoS protection and a Web
Application Firewall (WAF). You’ll need to import your DNS records to CloudFlare
and specify CloudFlare’s DNS servers with your domain registrar to facilitate
the service. Other nice features include apex <code>CNAME</code> records using the <code>@</code>
character (<a href="http://stackoverflow.com/a/16041655">traditionally challenging</a>), as
well as IPv6 DNS support.</p>
<h2 id="setting-up-free-universal-ssl-with-github-pages">Setting Up Free, Universal SSL with GitHub Pages</h2>
<p><em>(Note: you can really do this with any host, but I’m going to be describing how
I did this with my site.)</em></p>
<p>To get started, head over to <a href="https://www.cloudflare.com/sign-up">CloudFlare</a>
and create an account. Next, you’ll specify the website you want to use
CloudFlare with (be sure to use your custom DNS name, not <code>you.github.io</code>).
You’ll have to wait for a few minutes as CloudFlare scrapes your DNS records. Be
sure all of them are there, as any that aren’t will cease to be valid once you
enable CloudFlare.</p>
<p>Next, head over to your registrar and change your authoritative name servers to
the ones listed in CloudFlare to start routing your traffic through their
network. This will take some time to propagate through the DNS network, but
should be effective within a few hours. In the meantime, you can take a look at
the three Settings pages. There are many options for optimization, redirects,
caching, security, and more. The important one is to go down to the SSL option
and set it to Flexible SSL. Note that even though you can access your GitHub
pages site over SSL, trying to do so with full SSL through CloudFlare will
result in an “Unknown Site” error from GitHub.</p>
<p><em>Update on 22 May, 2015:</em> Since this article was published, CloudFlare has
<a href="https://support.cloudflare.com/hc/en-us/articles/205075117-FAQ-New-CloudFlare-Dashboard">updated their dashboard</a>. Now, the settings for SSL are located
under the <a href="https://www.cloudflare.com/a/crypto">“Crypto” tab</a> for your website.  The page rules as described
below are still configured the same way, but now found under the <a href="https://www.cloudflare.com/a/page-rules">“Page Rules”
tab</a>.</p>
<p>On the free tier, CloudFlare states that it will take up to 24 hours to
provision the SSL certificate for your site. In my case, it only took a few
hours. Using one of their paid plans will result in immediate provision. You can
check in on whether the certificate has been provisioned by trying to navigate
to <a href="https://yoursite.com">https://yoursite.com</a>. You’ll likely get a domain mismatch SSL error as
CloudFlare defaults to a different certificate until yours has been provisioned.
Once you stop receiving the error, you’re good to go!</p>
<p>The final step is to set up Page Rules (of which you get three for free) to
redirect visitors to the non-secure site to the SSL one. Go to <a href="https://www.cloudflare.com/my-websites">My
Websites</a> and click Page Rules under the
gear icon. Enter the URL patterns to match and flip the “Always use https” to
ON.</p>
<p><img src="https://static.benburwell.com/blog/cloudflare_ssl_page_rules.png" alt="Sample CloudFlare page rules for always using SSL"></p>
<p>That’s it! You’ve taken an important step towards making the web browsing
gxperience more secure and private for your visitors.</p>

      ]]>
    </content>
    <author>
      <name>Ben Burwell</name>
    </author>
    <summary type="html">
      On September 29, 2014, CloudFlare, a web security company and CDN provider, announced that they would begin offering free, automatic SSL to all its customers (including those on their free plan). This is an enormous step forward for enhancing security and privacy on the Internet; while website owners would previously need to purchase an SSL certificate for their site and often pay extra for SSL hosting, CloudFlare now makes this all free.
    </summary>
  </entry>
  
  <entry>
    <title type="html">LESS File Compilation for Jekyll and GitHub Pages</title>
    <link href="https://www.benburwell.com/posts/less-file-compilation-for-jekyll-github-pages/" rel="alternate" type="text/html" title="LESS File Compilation for Jekyll and GitHub Pages" />
    <published>2014-05-31T00:00:00Z</published>
    <updated>2014-05-31T00:00:00Z</updated>
    <id>https://www.benburwell.com/posts/less-file-compilation-for-jekyll-github-pages/</id>
    <content type="html" xml:base="/posts/less-file-compilation-for-jekyll-github-pages/">
      <![CDATA[
        <p>I recently wrote about <a href="/writing/migrating-to-github-pages-and-jekyll">migrating my website to GitHub
Pages</a> and noted that I wasn’t
completely satisfied with my deployment workflow. Ideally, <a href="http://www.joelonsoftware.com/articles/fog0000000043.html">creating a build
should be done in a single
step</a>. As I wrote, my
previous build workflow required me to manually compile my
<a href="http://lesscss.org">LESS</a> files before committing if I’d made changes. While my
stylesheet doesn’t change often, this method is certainly not ideal.</p>
<p>Using <a href="http://git-scm.com/book/en/Customizing-Git-Git-Hooks">Git hooks</a>, it’s
possible to run a script at certain points during the Git workflow. To take
advantage of this in my case, I added a small bash script to
<code>.git/hooks/pre-commit</code>:</p>
<pre tabindex="0"><code>#!/bin/sh

export PATH=/usr/local/bin:$PATH
cd /Users/Ben/Documents/Code/benburwell.github.io/assets/less
lessc --clean-css style.less ../css/style.css
cd /Users/Ben/Documents/Code/benburwell.github.io
git add /Users/Ben/Documents/Code/benburwell.github.io/assets/css/style.css
</code></pre><p>This is a pretty rough script, but it gets the job done for me. For a much more
thorough script, see <a href="http://tjvantoll.com/2012/07/07/the-ideal-less-workflow-with-git/">this article by TJ
VanToll</a>.</p>

      ]]>
    </content>
    <author>
      <name>Ben Burwell</name>
    </author>
    <summary type="html">
      I recently wrote about migrating my website to GitHub Pages and noted that I wasn’t completely satisfied with my deployment workflow. Ideally, creating a build should be done in a single step. As I wrote, my previous build workflow required me to manually compile my LESS files before committing if I’d made changes. While my stylesheet doesn’t change often, this method is certainly not ideal.
Using Git hooks, it’s possible to run a script at certain points during the Git workflow.
    </summary>
  </entry>
  
  <entry>
    <title type="html">Enhancing Printing at Muhlenberg</title>
    <link href="https://www.benburwell.com/posts/printing-at-muhlenberg/" rel="alternate" type="text/html" title="Enhancing Printing at Muhlenberg" />
    <published>2014-05-03T00:00:00Z</published>
    <updated>2014-05-03T00:00:00Z</updated>
    <id>https://www.benburwell.com/posts/printing-at-muhlenberg/</id>
    <content type="html" xml:base="/posts/printing-at-muhlenberg/">
      <![CDATA[
        <p>A common frustration of Muhlenberg students is to print a document to a dorm
printer only to find that the printer had no paper when going to collect it.
This leads to both frustration and wasted paper, since when more paper is put
into the printer, it will print out all the queued jobs from when the tray was
empty. By that time, students have often given up and printed their document to
another printer.</p>
<p>To avoid this, I created a web page that <a href="http://mathcs.muhlenberg.edu/~bb246500/printers/">reports the status of Muhlenberg
printers</a>. The PHP script
queries the printers to determine the status of their trays. If you’d like to
see other printers added, let me know <a href="mailto:hi@benburwell.com">by email</a> or
<a href="https://twitter.com/intent/tweet?text=@bburwell">on Twitter</a>.</p>
<h2 id="dns-names">DNS Names</h2>
<p>To facilitate printing from personal computers, I created DNS records for
several printers which enable them to be configured with a logical name rather
than by IP address. Currently, the following printers/DNS names are available:</p>
<ul>
<li><code>trumbower48.print.muhlenberg.benburwell.com</code></li>
<li><code>trumbower125.print.muhlenberg.benburwell.com</code></li>
<li><code>trumbower147.print.muhlenberg.benburwell.com</code></li>
</ul>

      ]]>
    </content>
    <author>
      <name>Ben Burwell</name>
    </author>
    <summary type="html">
      A common frustration of Muhlenberg students is to print a document to a dorm printer only to find that the printer had no paper when going to collect it. This leads to both frustration and wasted paper, since when more paper is put into the printer, it will print out all the queued jobs from when the tray was empty. By that time, students have often given up and printed their document to another printer.
    </summary>
  </entry>
  
  <entry>
    <title type="html">Migrating to GitHub Pages and Jekyll</title>
    <link href="https://www.benburwell.com/posts/migrating-to-github-pages-and-jekyll/" rel="alternate" type="text/html" title="Migrating to GitHub Pages and Jekyll" />
    <published>2014-05-01T00:00:00Z</published>
    <updated>2014-05-01T00:00:00Z</updated>
    <id>https://www.benburwell.com/posts/migrating-to-github-pages-and-jekyll/</id>
    <content type="html" xml:base="/posts/migrating-to-github-pages-and-jekyll/">
      <![CDATA[
        <p>I’ve always been a fan of using
<a href="http://daringfireball.net/projects/markdown/">Markdown</a> to create web content.
Several years ago, I created <a href="/projects/mdengine/">MDEngine</a>, a small PHP script
to render Markdown files in HTML dynamically. For a while, it was responsible
for much of the content on my website. In October 2013, I began work on a fresh
design. I decided to use a custom Node.js app deployed on Heroku for processing
the Markdown. While this worked effectively, I always had some reservations.</p>
<p>While my site was decently fast, there was no real reason that it needed to be
dynamically generated. I was particularly concerned with the performance of the
two list pages, whose backend logic consisted of parsing an entire directory of
Markdown files each time it was loaded. Though there was no noticeable
performance impact, it was not inconceivable that the page generation time would
increase substantially as content grew.</p>
<p>In late April 2014, I made some design updates to the site running on Heroku. I
decided to take the opportunity to address my performance concerns as well.
While my original intent was to simply clean up the server logic I had written,
I realized that it would be more sustainable in the long term to migrate to a
true static site using <a href="http://jekyllrb.com">Jekyll</a>.</p>
<h2 id="the-setup">The Setup</h2>
<p>Installing Jekyll locally was a piece of cake; simply running <code>gem install jekyll</code> did the trick. I already had a placeholder page in my
<a href="https://github.com/benburwell/benburwell.github.io">benburwell.github.io repo</a>,
so I <code>cd</code>’d to the parent directory and ran <code>jekyll new benburwell.github.io</code> to
overwrite the old content.</p>
<p>For those unfamiliar with <a href="https://pages.github.com">GitHub Pages</a>, anything
that you put in a repo named <code>[your username].github.io</code> will automatically be
served from that URL. You can also create branches named <code>gh-pages</code> in your
other repos to serve project-specific sites. In addition to serving static
content, GitHub Pages will automatically compile sites generated with Jekyll.</p>
<h2 id="porting-content">Porting Content</h2>
<p>Next came what was probably the most time-consuming part of the whole process:
converting the <a href="http://jade-lang.com">Jade</a> layout into pure HTML with
<a href="http://liquidmarkup.org">Liquid</a> markup. Luckily, this wasn’t too painful, and
I came out with <a href="https://github.com/benburwell/benburwell.github.io/tree/master/_layouts">two
layouts</a>,
page structure and navigation, and the other for displaying Posts.</p>
<p>My next challenge was to maintain my link structure so nothing would be broken.
The one exception I conceded to was my résumé, a PDF file that I had been
serving from <code>/resume/</code> using Express (admittedly a pretty poor idea). After
exploring the Jekyll documentation, I discovered that an easy way to separate
out my content into Writing and Projects as I’ve done on my site was to use the
built-in category functionality. I would simply create two category pages at
<a href="https://github.com/benburwell/benburwell.github.io/blob/master/writing/index.html"><code>/writing/index.html</code></a>
and
<a href="https://github.com/benburwell/benburwell.github.io/blob/master/projects/index.html"><code>/projects/index.html</code></a>
to render a list of posts from their respective categories, and tag each
Markdown document with the appropriate category. The final step was to define my
permalink structure in <code>_config.yml</code> which I did by adding <code>permalink: /:categories/:title/</code> to the file.</p>
<p>I next had the pleasure of renaming all of my content files to adhere to
Jekyll’s naming convention (<code>YYYY-MM-DD-hyphen-separated-title.markdown</code>) and
adding/modifying the front matter as necessary.</p>
<h2 id="additional-configuration">Additional Configuration</h2>
<p>I decided to <a href="https://help.github.com/articles/using-jekyll-plugins-with-github-pages">enable the <code>jekyll-sitemap</code>
plugin</a>
by adding <code>jekyll-sitemap</code> as a gem to <code>_config.yml</code>. This plugin will generate
<a href="http://www.sitemaps.org">an XML sitemap</a> that can be used by crawlers such as
those run by search engines to help determine what content needs to be indexed.</p>
<p>I moved my error page over and quickly translated the Jade to Markdown by
<a href="https://help.github.com/articles/custom-404-pages">following the instructions provided by
GitHub</a> for creating a custom
404 page. The only remaining issue was my stylesheet problem. In my Express app,
I used <a href="http://lesscss.org">Less</a> for writing my stylesheets. As of this
writing, Jekyll does not support compiled stylesheet languages like Less, though
<a href="http://jekyllrb.com/docs/assets/">there is the suggestion of future support</a>
for Sass and CoffeeScript.</p>
<p>For now, I’m keeping my stylesheets in <code>/assets/less/</code> and compiling them down
to a CSS file locally after making changes with <code>lessc --clean-css style.less ../css/style.css</code>. While this certainly isn’t perfect, it allows me to keep my
Less files intact and to serve minified CSS.</p>
<h2 id="conclusion">Conclusion</h2>
<p>All in all, the process went very smoothly. I made <a href="https://github.com/benburwell/benburwell.github.io/tree/042ebd011194592ec155181dc41976493a07e54a">the first Jekyll
commit</a>
at 18:52 and <a href="https://github.com/benburwell/benburwell.github.io/tree/35c2061dd13427b1b48525321f7f0156f0b83863">changed my DNS records from
Heroku</a>
at 21:20, spending about two and a half hours learning Jekyll and converting my
site over. This is a pretty rapid deployment — kudos to Jekyll for building such
an easy tool.</p>
<p>As far as the future goes, I’d like to see GitHub pages provide native support
for a stylesheet language, be it Less, Sass, or some other one. Additionally,
I’d like to see an HTML minification plugin (a minor optimization, but not
unreasonable). For the time being, I’m quite happily serving this site with
GitHub Pages.</p>

      ]]>
    </content>
    <author>
      <name>Ben Burwell</name>
    </author>
    <summary type="html">
      I’ve always been a fan of using Markdown to create web content. Several years ago, I created MDEngine, a small PHP script to render Markdown files in HTML dynamically. For a while, it was responsible for much of the content on my website. In October 2013, I began work on a fresh design. I decided to use a custom Node.js app deployed on Heroku for processing the Markdown. While this worked effectively, I always had some reservations.
    </summary>
  </entry>
  
</feed>
