Running pip in isolated mode also passes the --no-user-cfg
flag to Python’s distutils to disable reading
the per-user ~/.pydistutils.cfg. But that flag isn’t available in Python
versions 3.0–3.3, causing the error above.
I ran into this because I recently migrated the Python code that generates
this site to run under Python 3.x. I’m using a virtualenv setup, so once I
had everything working under both Python versions, I was reasonably
confident that I could switch ‘production’ (i.e. the Compute Engine
instance that serves this site) to Python 3 and discard the
Good thing I tested it out first, since it didn’t even install.
It turns out that:
virtualenv ships an embedded copy of pip and setuptools, but setuptools
will use the system version of distutils1, and
--no-user-cfg was added in Python 2.7, but wasn’t ported to 3.x until
the distribution I’m using on my real server (Debian 7) ships with Python
3.2.3, rather than the 3.4.x I’m using elsewhere.
I worked around this by just omitting the --isolated flag for Python
verions [3.0, 3.4) — though since I don’t actually have any system config
files in practice, I probably could have set PIP_CONFIG_FILE=/dev/null
instead (which has the effect of ignoring all config files).
I’m not the first person to have noticed that virtualenv isn’t actually
hermetic. Though some of that rant is out of date now (Python
wheel files provide prebuilt binaries), and some isn’t relevant to the way
I’m using virtualenv/pip, it’s definitely true that the dependency on the
system Python libraries is the main reasons I’d look to something more like
Docker or Vagrant for deployment were I doing this professionally.
So did I finally manage to switch to Python 3.x after that? Not even close:
Python 3.x didn’t gain the ability to (redundantly) use the u'foo' syntax
for Unicode strings until 3.3, and some of my dependencies use that syntax.
So I’m waiting until I can switch to Debian 8 on Compute
Engine3, at which point I can cleanly assume Python 3.4 or
This is a rant for another day, but it looks like virtualenv monkeypatches pip, which monkeypatches setuptools, which either monkeypatches or builds upon distutils. Debugging through this edifice of patched abstractions is… not easy. ↩
It’s a bit more complex than that: Python 3.0 and 3.1 were released first, then the feature was implemented in both 2.7 and 3.2, but then distutils as a whole was rolled back to its 3.1 state before 3.2 was released. That rollback was reverted for Python 3.4. ↩
I can apt-get dist-upgrade from the Debian 7 image just fine, but it’s a bit slow and hacky, so I’d rather wait for official images. (I also need to fix some custom mail-related configuration that appears to have broken under Debian 8.) ↩
It’s been a while since the last Noda Time release, and while we’re
still working towards 2.0, we’ve been collecting a few bug fixes that can’t
really wait. So last Friday1, we released Noda Time 1.3.1.
Noda Time 1.3.1 updates the built-in version of TZDB from 2014e to 2015a,
and fixes a few minor bugs, two of which were triggered by recent data
Since it’s been a while since the previous release, it may be worth pointing
out that new Noda Time releases are not the only way to get new time zone
data: applications can choose to load an external version of the time zone
database rather than use the embedded version, and so use
up-to-date time zone data with any version of the Noda Time assemblies.
If you’re in a hurry, you can get Noda Time 1.3.1 from the NuGet repository
(core, testing, JSON support
packages), or from the links on the Noda Time home page. The
rest of this post talks about the changes in 1.3.1 in a bit more detail.
End of year transitions (Bangladesh)
In the middle of 2009, Bangladesh started observing permanent daylight
saving time, as an energy-saving measure. This was abandoned at the end of
that year, and the country went back to permanent standard time.
Until recently, that transition back to standard time was actually recorded
as happening a minute too early, at 23:59 on December 31st. TZDB
2014g fixed this by changing the transition time to “24:00” — that is,
midnight at the end of the last day of the year.
Noda Time could already handle transitions at the end of the day, but would
incorrectly ignore this particular transition because it occurred ‘after’
2009. That’s now fixed, and Noda Time 1.3.1 returns the correct offset for
Asia/Dhaka when using data from TZDB 2014g or later.
BCL provider: historical changes to the base offset (Russia)
In October 2014, most of Russia switched from permanent daylight saving time
to permanent standard time, effectively moving local time back one hour.
These changes were included in TZDB 2014f.
For people using the BCL provider instead of the TZDB
provider (and using Windows), Microsoft delivered a
hotfix in September 2014. However, our BCL provider depends upon the .NET
framework’s TimeZoneInfo class, and the .NET framework — unlike TZDB —
is unable to represent historical changes to the ‘base’ offset of a time
zone (as happened here).
The result is that Noda Time (and other applications using
TimeZoneInfo in .NET 4.5.3 and earlier) incorrectly compute the offset for
dates before October 26th, 2014.
A future update of the .NET framework should correct this limitation, but
without a corresponding change in Noda Time, the extra information wouldn’t
be used; Noda Time 1.3.1 prepares for this change, and will use the correct
offset for historical dates when TimeZoneInfo does.
BCL provider: time zone equality
The time zones returned by the BCL provider have long had a limitation in
the way time zone equality was implemented: a BCL time zone was considered
equal to itself, and unequal to a time zone returned by a different
provider, but attempting to compare two different BCL time zone instances
for equality always threw a NotImplementedException. This was
particularly annoying for ZonedDateTime, as its equality is defined in
terms of the contained DateTimeZone.
This was documented, but we always considered it a bug, as it wasn’t
possible to predict whether testing for equality would throw an exception.
Noda Time 1.3.1 fixes this by implementing equality in terms of the
underlying TimeZoneInfo: BCL time zones are considered equal if they wrap
the same underlying TimeZoneInfo instance.
Note that innate time zone equality is not really well defined in general,
and is something we’re planning to reconsider for Noda Time 2.0. Rather
than rely on DateTimeZone.Equals(), we’d recommend that applications that
want to compare time zones for equality use
ZoneEqualityComparer to specify how two time
zones should be compared.
There are a handful of other smaller fixes in 1.3.1: the NodaTime assembly
correctly declares a dependency on System.Xml, so you won’t have to; the
NuGet packages now work with ASP.NET’s kpm tool, and declare support for
Xamarin’s Xamarin.iOS (for building iOS applications using C#)
in addition to Xamarin.Android, which was already listed; and we’ve fixed a
few reported documentation issues along the way.
Work is still continuing on 2.0 along the lines described in our 1.3.0
release post, and we’re also planning a 1.4 release
to act as a bridge between 1.x and 2.0. This will deprecate members that
we plan to remove in 2.0 and introduce the replacements where feasible.
Release late on Friday afternoon? What could go wrong? Apart from running out of time to write a blog post, I mean. ↩
Noda Time’s repository is a lot simpler than Subversion’s (it’s also at
least an order-of-magnitude smaller), so it wasn’t that difficult to come up
with a measure of code size: I just counted the lines in the .cs files
under src/NodaTime/ (for the production code) and src/NodaTime.Test/
(for the test code).
I decided to exclude comments and blank lines this time round, because I
wanted to know about the functional code, not whether we’d expanded our
documentation. As it turns out, the proportion of comments has stayed about
the same over time, but that ratio is very different for the production code
and test code: comments and blank lines make up approximately 50% of the
production code, but only about 20–25% of the test code.
Here’s the graph. It’s not exactly up-and-to-the-right, more…
There are some thing that aren’t surprising: during the pre-1.0 betas (the
first two unlabelled points) we actively pruned code that we didn’t want to
commit to for 1.x2, so the codebase shrinks until we
release 1.0. After that, we added a bunch of functionality that we’d been
deferring, along with a new compiled TZDB file format for the PCL
implementation. So the codebase grows again for 1.1.
But then with 1.2, it shrinks. From what I can see, this is mostly due to
an internal rewrite that removed the concept of calendar ‘fields’ (which had
come along with the original mechanical port from Joda Time). This seems to
counterbalance the fact that at the same time we added support for
serialization3 and did a bunch of work on parsing and formatting.
1.3 sees an increase brought on by more features (new calendars and APIs),
but then 2.0 (at least so far) sees an initial drop, a steady increase due
to new features, and (just last month) another significant drop.
The first decrease for 2.0 came about immediately, as we removed code that
was deprecated in 1.x (particularly, the handling for 1.0’s
non-PCL-compatible compiled TZDB format). Somewhat surprisingly, this
doesn’t come with a corresponding decrease in our test code size, which has
otherwise been (roughly speaking) proportional in size to the production
code (itself no real surprise, as most of our tests are unit tests). It
turns out that the majority of this code was only covered by an integration
test, so there wasn’t much test code to remove.
There’s no real decrease in test code size though: most of the C# 6 features
are really only useful for production code.
All in all, Noda Time’s current production code is within 200 lines of where
it was back in 1.0.0-beta1, which isn’t something I would have been able to
predict. Also, while we don’t quite have more test code than production
code yet, it’s interesting to note that we’re only about a hundred lines
Does any of this actually matter? Well, no, not really. Mostly, it was a
fun little exercise in plotting some graphs.
It did remind me that we have certainly simplified the codebase along the
way — removing undesirable APIs before 1.0 and removing concepts (like
fields) that were an unnecessary abstraction — and those are definitely
good things for the codebase.
And it’s also interesting to see how effective the syntactic sugar in C# 6
is in reducing line counts, but the removal of unnecessary text also
improves readability, and it’s that that’s the key part here rather than the
number of lines of code that results.
But mostly I just like the graphs.
Or, if you prefer BuzzFeed-style headlines, “You won’t believe what happened to this codebase!”. ↩
To get to 1.0, we removed at least: a verbose parsing API that tried to squish the Noda Time and BCL parsing models together, an in-code type-dependency graph checker, and a very confusingly-broken CultureInfo replacement. ↩
I’m not counting the size of the NodaTime.Serialization.JsonNet package here at all (nor the NodaTime.Testing support package), so this serialization support just refers to the built-in XML and binary serialization. ↩
If you have a privileged process that needs to invoke a less-trusted child
process, one easy way to reduce what the child is able to do is to run it
under a separate user account and use ssh to handle the delegation.
This is pretty simple stuff, but as I’ve just wasted a day trying to achieve
the same thing in a much more complicated way, I’m writing it up now to make
sure that I don’t forget about it again.
(Note that this is about implementing privilege separation using ssh, not
about how ssh itself implements privilege separation; if you came here for
that, see the paper Preventing Privilege Escalation by Niels Provos
In my case, I’ve been migrating my home server to a new less
unhappy machine, and one of the things I thought I’d clean up
was how push-to-deploy works for this site, which is stored in a Mercurial
What used to happen was that I’d push from wherever I was editing, over ssh,
to a repository in my home directory on my home server, then a changegrouphook would update the working copy (hg up) to include whatever I’d
just pushed, and run a script (from the repository) to deploy to my
webserver. The hook script that runs sends stdout back to me, so I also
get to see what happened.
(This may sound a bit convoluted, but I’m not always able to deploy directly
from where I’m editing to the webserver. This also has the nice property
that I can’t accidentally push an old version live by running from the wrong
place, since history is serialised through a single repository.)
The two main problems here are that pushing to the repository has the
surprising side-effect of updating the working copy in my home directory
(and so falls apart if I accidentally leave uncommitted changes lying
around), and that the hook script runs as the user who owns the repository
(i.e. me), which is largely unnecessary.
For entirely separate reasons, I’ve recently needed to set up shared
Mercurial hosting (which I found to be fairly simple, using
mercurial-server), so I now have various repositories owned by a single
I don’t want to run the (untrusted) push-to-deploy scripts directly as that
shared user, because they’d then have write access to all repositories on
the server. (This doesn’t matter so much for my repositories, since only I
can write to them, and it’s my machine anyway, but it will for some of the
In other words, I want a way to allow one privileged process (the Mercurial
server-side process running as the hg user) to invoke another (a
push-to-deploy script) in such a way that the child process doesn’t retain
the first process’s privileges.
There are lots of ways to achieve this, but one of the simplest is to run
the two processes under different user accounts, then either find a way to
communicate between two always-running processes (named pipes or shared
memory, for example), or for one to invoke the other directly.
The latter is more appropriate in this case, and while the obvious way for a
(non-root) user to run a process as another is via sudo, the policy
specification for that (in /etc/sudoers) is… complicated. Happily,
there’s a simpler way that only requires editing configuration files owned
by the two users in question: ssh.
The setup is fairly easy: I’ve created a separate user that will run the
push-to-deploy script (hg-blog), generated a password-less keypair for
the calling (hg) user, and added the public key (with from= and
command= options) to /home/hg-blog/.ssh/authorized_keys.
Now the Mercurial server-side process can trigger the push script simply by
creating a $REPOS/.hg/hgrc containing:
This automatically runs the command I specified in the target user’s
authorized_keys, so I don’t even have to worry about listing it
In conclusion, ssh is pretty good tool for creating a simple privilege
separation between two processes. It’s ubiquitous, and doesn’t require
root to do anything special, and while the case I’m using it for here
involves two processes on the same machine, there’s actually no reason that
they couldn’t be on different machines.
The ‘right’ answer may well be to run each of these as Docker containers,
completely isolating them from each other. I’m not at that point yet, and
in the meantime, hopefully by writing this up I won’t forget about it the
next time I need to do something similar!
In this case, adding a command restriction doesn’t protect against a malicious caller, since the command that’s run immediately turns around and fetches the next script to run from that same caller. It does protect against someone else obtaining the (password-less by necessity) keypair, I suppose, though the main reason is the one listed above: it means that ‘what to do when something changes’ is specified entirely in one place. ↩