SICPers podcast episode 4

We’re back to Amiga-Smalltalk today, as the moment when it runs on a real Amiga inches closer. Listen here.

I think I’ve isolated all extraneous sound except the nearby motorway, which I can’t do much about. I hope the experience is better!

Posted in Amiga, podcast | Leave a comment

Episode 4: do you C what I C?

We’re getting closer to running Amiga-Smalltalk actually on an Amiga. Full show notes

Leave a comment

SICPers Podcast Episode 3

The latest episode of SICPers, in which I muse on what programming 1980s microcomputers taught me about reading code, is now live. Here’s the podcast RSS feed.

It is practically impossible to teach good programming to students that have had a prior exposure to BASIC: as potential programmers they are mentally mutilated beyond hope of regeneration. Edsger Dijkstra, “How do we tell truths that might hurt?”

As always, your feedback and suggestions are most welcome.

Posted in podcast | Leave a comment

Episode 3: Graham’s so basic

What did programming on a microcomputer with a 6809 CPU and 32k of RAM teach me about reading code? Full show notes

Leave a comment

SICPers Podcast episode two

Episode 2 is live!

The only link I promised this week is to the BCHS web application stack. Short for BSD, C, httpd, sqlite, it’s a minimalist approach to making web applications.

As ever, your feedback is welcome, here or wherever you find me.

Posted in podcast | Leave a comment

Episode 2: From Programming to Software Engineering

Graham discusses the experience of taking on a project that has been started by an enthusiastic novice and getting ready to engineer changes.

Full show notes: https://www.sicpers.info/2020/04/sicpers-podcast-episode-two/

Leave a comment

SICPers podcast episode one

I made a podcast! Full show notes here due to the character limit at podbean.

  • Amiga-Smalltalk project on GitHub

  • Free books on Smalltalk: the three Addison-Wesley books “Smalltalk-80: The Interactive Programming Environment”, “Smalltalk-80: The Language and its Implementation” and “Smalltalk-80: Bits of History, Words of Advice” are mentioned in this podcast, and the second of those is “the blue book” at the centre of the episode.

  • AROS Research Operating System is the Amiga-compatible open source operating system. It can boot on (i386 or m68k) hardware, or run hosted in Linux, FreeBSD, macOS, and Windows.

Please let me know what you think! You can find me on twitter at @iwasleeg, and I gave out my email address in the podcast. You could also comment here!

Errata: I said the Amiga 1000 had 128kB of RAM but it had 256kB, sorry!

Posted in podcast | Leave a comment

Episode 1: Amiga-Smalltalk

Graham describes what he’s learned about object-oriented programming and memory management from the beginning of his project to write a Smalltalk-80 implementation.

Full show notes: https://www.sicpers.info/2020/04/sicpers-podcast-episode-one/

1 Comment

Stay on target…

I introduce the kind of customer who needs the Labrary’s advice with the following description:

Your software team was a sight to behold, when it started out. You very quickly got to an MVP, validated its fit with early successes, iterated on the user experience and added the missing features. You hired a few more developers to cover the demand.

Now, things are starting to feel slower. The team insists they’re still continuing apace, but you haven’t kept that initial excitement. Developers are grumbling about technical debt. The backlog keeps growing. Testers aren’t keeping up – despite automation. The initial customers aren’t getting the benefits they first expected, and new customers aren’t being won at the rate you’d like.

The problem was that the way you hit it out of the park worked well in the early days, when you had a green field project and no existing code or customers to support. Now your customers expect all new features and surprising and delightful interactions: but they also expect nothing to change, and certainly not to break. The desired qualities of your software have changed, and so the quality of your software must change.

Plenty of people, typically CTOs and heads of software development, typically at growth scale, identify with this description, so it’s worth digging deeper into how it comes about.

At the early stage, your company has a small team, a vague idea of what the product is, and no customers. Literally anything your engineering team can do will be valuable: it will either be a product that fits the market, or tell you where the market isn’t. Obviously there’s some hand-waving about being able to market and sell whatever it is that your engineering team build, but by and large anything you come up with is somehow useful. You are either defining a new market, in which case all work is market-leading, or entering an established market, in which case your direction is clear. It’s hard to make a wrong decision at this stage, but very easy to stick to one.

And while we all know the horror stories about shipping your prototype to production, it’s actually not a bad plan at this stage. You don’t know what will or won’t work, so getting something out there quickly is exactly the right thing to do. And your developers probably have a base standard of maturity even for prototype projects, so you’ll have version control, some form of testing infrastructure and CI, external libraries for data storage, it won’t be a complete wild west.

Things go, loosely speaking, in one of two directions here. If you fail to find the right customers and the right product, you’re out of money, thanks for playing. If you find the right product for the right people, then congratulations! You get more money, either through revenue or a funding round, and you grow the company. Maybe the programmer you were paying before becomes the CTO, maybe one or two of the contractors you worked with come on as perms, and you get a couple of new people come on in return for an OK salary and the promise of the stock sometime being worth something. One of those people is even a QA!

Of course, the cash injection (particularly if it came as a lump through funding that can be drawn down as necessary) gives you the headroom to do things properly. Technologies are chosen, an architecture is designed (usually just by connecting the technologies with arrows), and an attempt is made to build the new thing, support the old thing, and continue adding new features and differentiate in the market. A key customer demographic is sold the promise of the new system (it being exciting and more capable, at least that is what the roadmap says), takes delivery of the old system (it being ready), then takes up time asking for the new features. You either divert resources from the new system to the old to add the features there, or invent some unholy hybrid where your existing thing makes calls to the new thing for the new features with a load of data consistency glue binding the two together. We’ll call these customers “saps” for now. Also, whether you’ve caught up to your competitors or your competitors to them, you’re now having to maintain an edge.

Let’s take stock here. You have:

  1. Some saps, giving us actual money, on the old platform.
  2. Some hope that things will be easier once everything’s on the new platform.
  3. Pressure to stay ahead of/catch up to the competition in both places.

That’s more work! But it’s OK, you’ve got more people. But where you add to the old system (which pleases your saps) you take away from the new, so tend to favour unintrusive patches rather than deeper changes there. Which makes it harder to understand, and harder to support, which is more work! OK, so hire more people! But now engineering costs more. OK, so sell the original thing (not the new thing, it isn’t ready yet) to a few more customers! But now there are more customers, demanding more features, and more support. OK, so hire more people!

Run through that cycle a few times and you end up in the place I described in the Labrary blurb. You’ve got a big team, filled with capable engineers, working hard, and delivering…not as much as anyone would like. The problem is that working on the software is pulling them away from working for the company. The other problem is that you’re measuring how much the software gets worked on.

Posted in agile, process, team | Leave a comment

On the tyranny of autoincrementing integer primary keys

In designing a relational database schema, many people will automatically create a column id integer primary key for every table, using their database’s automatic increment feature to assign a new value to each row they insert. My assertion is that this choice of primary key should be the last resort, not the first.

A database schema is a design artifact, describing the data we want to store and the relationships between records (rows) in those data. It is also meta-design, because the schema constrains us in designing the queries we use to work with the data. Using the same, minimal-effort primary key type for every table then avoids communicating information about the structure and meaning of the data in that table and imposes irrelevant features in the queries we design.

The fact that people use the name id for this autoincrementing integer field gives away the fact that the primary key is used to identify a row in a database. The primary key for a table should ideally be the minimal subset of relevant information that uniquely identifies an individual record. If you have a single column, say name, with not null and unique constraints, that’s an indicator (though not a cast-iron guarantee) that this column may be the table’s primary key. Sometimes, the primary key can be a tuple of multiple columns. A glyph can be uniquely identified by the tuple (character, font, swash) for example (it can, regardless of whether this is how your particular favourite text system represents it, or whether you think that this is a weird way to store ligatures). The glyphs “(e, Times New Roman Regular 16pt, normal)” and “(ct, Irvin Demibold 24pt, fancy)” are more readily recognisable than the glyphs “146276” and “793651”, even if both are ways to refer to the same data. A music album is identified by the artist and the album name (he says, side-eyeing Led Zeppelin): “A Night at the Opera” is ambiguous while “(Blind Guardian, A Night at the Opera)” is definitely not “(Queen, A Night at the Opera)”.

Use an integer identifier where there is no other way to uniquely identify rows in a table. Note: sometimes there is another, more meaningful way, even where that just means using somebody else’s unique identifier: different copies of the same book will have unique shelfmarks if they’re part of a library, for example. People in an organisation may have an employee number, or a single sign-on user name; though there may be privacy reasons not to use these.

A side-effect of using useful information to identify rows in a database is that it can simplify your queries, because where your foreign keys would otherwise be meaningless numbers, they now actually carry useful information. Here’s an example, from a real application, in which I’m sad to say I designed both the “before” and “after” schemata.

The app is a risk management tool. There are descriptions of risks (I’d like to believe that they all at least have a distinct description but I can’t be sure, so those will use integer id PKs), and for each risk there are people in certain roles who bring particular skills to bear on mitigating the risk. The same role can be applied to more than one risk, the same skill can be applied by more than one role, and one role may apply multiple skills, so there’s a three-way join to work out, for a given risk, what roles and skills are relevant.

The before schema:

create table risk (id integer primary key, description varchar not null, weight integer, severity integer, likelihood integer); -- many fields elided
create table role (id integer primary key, name varchar not null, unique(name)); -- ruh roh
create table skill (id integer primary key, name varchar not null, unique(name)); -- the same anti-pattern twice
create table risk_role_skill (id integer primary key, risk_id integer, role_id integer, skill_id integer, foreign key(risk_id) references risk(id), foreign key(role_id) references role(id), foreign key(skill_id) references skill(id));

In this application, we start by looking at a list of risks then inspect one to see what roles are relevant to mitigating it, and then what skills. So a valid question is: “given a risk, what roles are relevant to it?”

select distinct role.name inner join risk_role_skill on role.id = risk_role_skill.role_id where risk_role_skill.risk_id = ?;

But if we notice the names of each role and skill are unique, then we can surmise that they are sufficient to identify a given role or skill. In fact, the only information we have about roles or skills are the names.

create table risk (id integer primary key, description varchar not null, weight integer, severity integer, likelihood integer); -- many fields elided
create table role (name varchar primary key); -- uhhh...
create table skill (name varchar primary key); -- this still looks weird...
create table risk_role_skill (id integer primary key, risk_id integer, role_name varchar, skill_name varchar, foreign key(risk_id) references risk(id), foreign key(role_name) references role(name), foreign key(skill_name) references skill(name));

Here’s the new query:

select distinct role_name from risk_role_skill where risk_id = ?;

We’ve removed the join completely!

Two remaining points:

  1. There’s literally no information carried in the role and skill tables now, other than their identifying names. Does that mean we need the tables at all? In this case no, but in general we need to think here. How are the names in the join table going to get populated otherwise? If there are a limited set of valid values to choose from, then keeping a table with the range of values and a foreign key constraint to that table may be a good way to express the intent that the column’s content be drawn from that range. As an example, a particular bookstore may have printed, ebook, and audiobook media, so could restrict the medium field in their stock table to one of those values.
  2. Why does the risk_role_skill table have an identifier at all? It is a collection of associations between values, so a row’s content is that row’s identity.

Here’s the after schema:

create table risk (id integer primary key, description varchar, weight integer, severity integer, likelihood integer); -- many fields elided
create table risk_role_skill (risk_id integer, role varchar, skill varchar, foreign key(risk_id) references risk(id), primary key(risk_id, role, skill));

And the after query:

select distinct role from risk_role_skill where risk_id = ?;

Two fewer tables, no joins, altogether a much simpler database to understand and work with.

Posted in design | Leave a comment