Is it time to stop saying 'data are'?

I am going to stop saying ‘data are’ and just say ‘data is’.

Language shouldn’t be a barrier, but an aid to understanding and ‘data are’ adds to the wall of jargon that could exclude people we need talking about data and technology.

Moreover, after a thumb through a dictionary and a rummage around Google Ngram, I guess that most people use the term incorrectly anyway.

Definition

Collins dictionary defines data as either the plural of datum or an uncountable noun (I’ve only just learned that term). Another example of an uncountable noun is information.

So, when we use data as a plural of datum, we should say ‘data are’ and when using it as an information-like uncountable noun we should say ‘data is’.

So how often do we say ‘data is’ compared to ‘data are’?

A question of education?

This graph from Google Ngram (US American books 2012) shows the relative use of each term since 1970 (the start of the modern computing epoch). In 1970, the use of ‘data are’ was three times more frequent than ‘data is’, but since then those figures are have converged to similar frequency of use.

So do we actually mean to use the plural half of the time?

Let’s look at some other nouns and see how often we say them in the singular and the plural.

The data is a little tricky to read in the format Google presents it, but here is a summary:

Solution appears about three times more often that solutions (2.741). Pedant appears about twice as often as pedants (1.890). Vegetable appears a bit bit less often vegetables (0.740).

So perhaps we would expect to see the word ‘data’ being used as a plural to appear with a similar(ish) frequency (OK, I admit, more digging required and ideally by a linguist…).

But we also know from the first graph that ‘data are’ is used a similar number of times to ‘data is’. So we would expect to see the word ‘data’ appear twice as often as the other plurals to account for its use as an uncounted noun. So let’s give it some room and put a guess at up 10 ten times as often.

Data appears 173 more times than datum (173.033).

So what?

It’s easy to think that this is all just a bit pedantic. Who cares as long as everyone understands what is meant?

  1. Anything that jars is a distraction The other day, a UK Government minister used the term during the press conference. Don’t ask me what the press conference was about, because after he said ‘data are’ I stopped concentrating on the message.

  2. It sacrifices inclusivity to feel included I suppose ‘data are’ was something that that Government minister picked up on the job and then parroted it in the way that sounded authoritative. The trouble is, it was a convincing as when he informed us of a new ‘app’ that was being developed using a special kind of low-energy Bluetooth (every Year 8 kid who has played with a MicroBit programmes Bluetooth LE…)

The trouble is, however good it feels to say the right thing (even if, as we have seen, is probably wrong), it puts another brick in the wall of intimidating jargon.

So, perhaps we should all make a pact to only say ‘data are’ if we (and more importantly the people we are talking to) have used the word datum in the last year.