I've been working on some web applications lately, that have a large "person" component.
Most web apps probably deal to some extent with storing people's names, even if it's not their main focus. (I'm trying to imagine how many database tables around the globe might record in their rows and columns some version of "Dan Jewett."1
So, in the application, there is a database table that records all of the "person" info, and it has the fields first_name
, last_name
, and maybe even middle_name
, right? I've always thought so. I don't anymore.
This first became an issue for me when I was building an application to track my music collection. Most of the information in the database was seeded by parsing the tag information contained in the audio files themselves or by parsing CD info from various online databases. When I built the original application, I wanted to be able to sort artists alphabetically (by last name of course) but all of my initial data sources did not make that distinction. The tags had only an 'Artist' field. So, I, believing that I had a more sophisticated understanding of the data I was dealing with, began to massage the data in my sources so that I could parse it easily into a dual field name structure. Specifically, I began to change all of the artist fields to contain "Dylan, Bob" and "Davis, Miles." That way, in my code, I could just split on the comma and put things where, in a true and just world, they ought to be. Artists who were actually groups just had an empty first_name
field (or first_name
= "The", last_name
= "Beatles") and "Sting" and "Madonna" are last names.
I won't give you a chance to imagine how much time I spent diligently upholding this decision.
This presents the question I should have asked much sooner, "What's so special about a last name?" Or better yet, "Why sort names at all?"
These days, in the digital data realm, we don't return our data sets sorted so much as we return them filtered and grouped. A list is sorted by name as an aid to our eye as we scan through a long data set. A filtered data set returns only the items we were looking for in the first place, and the importance of having that data sorted on a field with arbitrary data (like "name") is lessened. It makes much more sense to be sorting chronologically or grouping by category. And yes, names are arbitrary!
In the western world we are quite used to dealing with variations on the given name, surname combination. What we really hope for is that everyone will use the typical patronymic arrangement of given name followed by daddy's name, but we're not so naive to count on this convention anymore. In fact, let's face it, this solution just doesn't scale.
In Spanish speaking cultures, a name has four parts: First name, second first name, father's last name, mother's last name. There can also be middle names, and married names can be added to the end. First name, last name database fields begin to seem rather feeble for dealing with names like this.
Similar to Hispanic cultures, a Chinese person typically has three names — a family name, a generational name, and a given name. There are actually only 472 Chinese surnames for a population of over 1 billion people!
Middle Eastern names, typically in Arabic, require at least four components from six or seven categories. Honorific, personal, descriptive, patronymic, geographical or tribal, and occupational.
There is no possibility of defining a convention to fit these cross-cultural naming differences into a first name, last name paradigm. Adding extra fields, such as middle name, does nothing to help.
So let's throw out the first name last name way and go with a single name field. What are the consequences?
I see two right away. The first has to do with the difference between manipulating the digital representation of real world objects and manipulating the objects themselves. Sorting is much more important if you are looking at racks of record albums and trying to find one you haven't played in years. We can't filter a shelf full of books. We have to pick through it. It makes sense that we would want to keep all of the recordings by a particular artist together on the shelf. But do we need to use the artist's last name?
Here is a list of recording artist's names sorted a few different ways:
By first name: | By last name: | Last name first: |
---|---|---|
A. Abrams | A*Teens | A*Teens |
A. K. Salim | A-One | A-One |
Abbey Lincoln | The AALY Trio | AALY Trio, The |
Abdullah Ibrahim | Philip Aaberg | Aaberg, Philip |
Adam Holzman | Jose Jimenez Abadia | Abadia, Jose Jimenez |
Adam Sandler | John Abercrombie | Abercrombie, John |
Adelaide Hall | Rabih Abou-Khalil | Abou-Khalil, Rabih |
Adrian Belew | Muhal Richard Abrams | Abrams, Muhal Richard |
Adrian Legg | A. Abrams | Abrams, A. |
Agnes Buen Garnas | Acoustic Alchemy | Acoustic Alchemy |
Ahmad Jamal | Beegie Adair | Adair, Beegie |
Aimee Mann | George Adams | Adams, George |
Al Green | Pepper Adams | Adams, Pepper |
Al Martino | Oleta Adams | Adams, Oleta |
Al Jarreau | Nat Adderley | Adderley, Nat |
Al Hibbler | Cannonball Adderley | Adderley, Cannonball |
Al Cooper | The Adderley Brothers | Adderley Brothers, The |
Al Wilson | Sunny Ade | Ade, Sunny |
Al Cohn | Ron Affif | Affif, Ron |
Al Casey | African Headcharge | African Headcharge |
Al Hirt | Afro Algonquin | Afro Algonquin |
Alan Broadbent | Afro Cult Foundation | Afro Cult Foundation |
Alan Silva | Ailana | Ailana |
Albert King | Air | Air |
Albert Ayler | Kei Akagi | Akagi, Kei |
It seems that sorting by last name and displaying by full name (the second list) produces a somewhat visually disconcerting list, so let's not use that one. The first name and last name sorts produce different content for names beginning with "A." But do we really care? I find I can't come up with a reason why it matters and I'm absolutely certain I have never uttered a sentence like, "My favorite tenor saxophone player is Coltrane, John." On your reception dinner guest list you wouldn't sort on last names so that you could put all the Johnson's and Johnston's at the same table. You might, however, sort your donors list by contribution size to properly list them in the program at your annual "Save The Rotary Dial Phone" banquet. And that would be a sort that was useful.
The second issue with using a single field for names in a database comes up when we want to contextualize the name. If we are writing an application or sending out a mailing where we would like to address a person less formally than by using their full name, it would be difficult to parse the name field with any confidence. I often receive mail in which my first name has been merged into a letter template. The letter might not read so smoothly if my name was "Luis Eduardo García Fernández Luna Galván," and was inserted repeatedly into the text. On the other hand, I always throw those types of letters into the recycling without reading them anyway.
To allow for some contextualization, many Web forms now provide a field for a user to supply an informal name, nickname, or screen name in addition to their full name. Adding a field for a person's title also helps with contextualization.
And so, after much consideration, I declare, "I'm finished with the tyranny of last name first! I will no longer be intimidated by the name 'Abu Karim Muhammad al-Jamil ibn Nidal ibn Abdulaziz al-Filistin'2 and I welcome him in my database tables!" Now I just have to adjust about 50,000 mp3 tags and get those commas out of there.
-
Not Dan Jewett the co-composer of 'Round Here, but Dan Jewett the New York area nerd. ↩
-
Which translates to, "Father-of-Karim, Muhammad, the beautiful, son of Nidal, son of Abdulaziz, the Palestinian" ↩