When it comes to selling books we all acknowledge that in the key western markets Amazon rules the roost, with total dominance over digital sales, and widely regarded as owning half the print market.

But when it comes to selling its own imprint titles Amazon barely nudges the needle in the print sector.

In the ebook sector it’s a very different story, as we’ve explored here at TNPS on many occasions.

For the new year the UK’s The Bookseller announces it will be posting weekly snapshots of the UK ebook charts as determined by Bookstat, the controversial data analysis machine run by Paul Abbassi, formerly known as Data Guy.

The first snapshot, observing 7 out of the UK top ten ebook titles are A-Pub, will carry no surprises for TNPS regulars – take this post for example:

But the Bookstat chart seem to have surprised The Bookseller, which notes,

The first chart, for the week ending 4th January shows the extent to which Amazon and its subscription service Kindle Unlimited can dominate e-book sales, with four of its titles in the top five positions.

A close look at the ebook chart also shows the way in which US and UK reading tastes diverge. In the US romance is widely believed to be the most popular genre, with Amazon romance imprints like Lake Union leading the ebook charge.

In the UK, the mystery and thriller sector of the market is widely acknowledged as the most active, so the Bookstat assertion that the Amazon thriller imprint Thomas & Mercer is the driving force in UK ebook sales isn’t a big surprise.

The problem is, just what are we to make of Bookstat’s numbers?

The Bookseller explains,

Bookstat tracks sales by measuring the relative movement of titles on publicly available retailer websites, calibrating them against the actual title sales reported by partner publishers.

In other words, these numbers are not real numbers reported by booksellers, but guestimates based on chart position, using techniques developed back when Abbassi ran the Author Earnings website, and that’s where concerns arise.

Author Earnings was notorious for how it presented selected data to make a particular point, on each occasion ensuring deliberate inconsistency with previous reports so it was difficult to make meaningful judgments from one report to the next.

That’s not just my opinion. This was Porter Anderson and Jane Friedman in the Hot Sheet in February 2018:

The reports were erratic in focus, each changing the basis for analysis, so that no consistent comparative picture could be built from them.

And on the rare occasions when there was a clear relationship between new and past data it quickly became apparent the numbers Abbassi gave us were what whatever suited his case that week. Often the historical numbers would be changed to match the latest agenda.

In 2018 Data Guy, as Abbassi then called himself, realised the numbers he’d given us in 2016 didn’t match up with what he want to tell us in 2018, so he casually updated the 2016 numbers to suit.

It was not the only concern raised about the Author Earnings-Bookstat accounting methods.

In that post I took a look at the first Bookstat numbers Data Guy had made public, and compared them with the then public numbers Data Guy had told us when Bookstat was still called Author Earnings.

N.B. The links will take you to a dead end because Data Guy has since taken down the entire Author Earnings website to make sure further comparisons between his historical data and the data he is selling now cannot easily be made.

Data Guy asserts in the January 2018 Author Earnings Report that in Q2-Q4 2017 the total value of ebooks sold in the US was $1.3 billion. (Dead link.)

Data Guy makes the point that seasonal differences in the ebook sector are negligible, so it would not be unreasonable to assume that, if $1.3 billion worth of ebooks were sold in Q2-Q4 then for the whole year the value would be $1.7 billion.

Now that’s an impressive number, of course. But here’s the thing.

In the February 2017 Author Earnings Report looking at the full year 2016, Data Guy told us categorically that total US ebook sales were worth $3.2 billion. (Dead link.)

Wait. What? Has the US ebook market collapsed between 2016 and 2017?

Well, we all know the ebook sales of the Big 5 have plummeted. But by that much?

If the market has collapsed, why isn’t Data Guy telling us about it? If it hasn’t, then allowing for Data Guy’s claims of precision with the new numbers, were Data Guy’s numbers wrong before?

Have we previously been told, erroneously, that the US ebook market was worth much more than it actually was?

The post then took a look at Data Guy’s report on US romance sales, where again what he was telling us one minute bore no relationship to what he was telling us the next.

Back at the RWA Conference in 2016 Data Guy confidently stated 235 million romance ebooks were sold in 2015. (Dead link.)

Impressive! If accurate. But was it?

Here’s the thing: when Data Guy in January 2018 offered us his Q2-Q4 numbers for unit romance sales for 2017 he clearly gives us a number for just over 50 million.

Extrapolating for the extra three months to give us a full year we arrive at a number of 66.6 million. A far cry from the 235 million romance sales Data Guy was asserting happened in 2016.

Romance, so we are told, is the biggest selling genre (but hold that thought – see below) so this much smaller number of romance ebooks being sold, as per Data Guy 2018 compared to 2016, is consistent with the overall ebook dollar value being much smaller as reported in 2018 compared with 2017. Again, all this is Data Guy’s publicly revealed numbers, so who knows what other contradictions lie in the BookStat data we are not being allowed to see.

It gets worse. Data Guy even admitted his stats weren’t actually all that accurate.

Sometimes, especially when an indie isn’t using ISBNs (and most aren’t), the algorithms don’t do a great job of auto-correlating different versions of the same book selling at different retailers, so they look like different books to our spider. Foreign-language editions, even very low-selling ones, also creep into the title count if they are available from US retailers.

Okay, so blame indie authors for not using ISBNs. The problem then is that, given most indies don’t, that means Bookstat figures for self-publishers cannot be at all relied upon.

But what about print? All print books have ISBNs, so safe to say the Bookstat sales numbers for print sales must therefore be accurate. And especially so for big names like JK Rowling, which cannot be fudged with other authors and titles of the same name.

Well, one might think not, but in the 2018 Bookstat details that were publicly shared Data Guy even managed to get JK Rowling’s numbers wrong. No, not her sales numbers, which nobody but her, her agent and her publisher will know for sure, but the ever so simple matter of how many titles she has.

You would think a Middle Grade school kid could take a look on Amazon and work that out, and maybe they have. But Data Guy’s Bookstat managed to mess up even that simple task, listing JK Rowling’s titles twice on the list of bestselling print authors, at #5 and #17.

Data Guy explained that away in his usual dismissive tone reserved for anyone daring to question blatantly erroneous numbers.

Inconsistency in the publisher-entered “author name” metadata at the different retailers, or for different books by the same author at the same retailer. One of the many items on the cleanup list…

TNPS was not prepared to let go so lightly.

But hold on, Data Guy. Just now it was the fault of indies not using ISBNs. But ALL print books have ISBNs. And exactly which publisher would inconsistently enter the name JK Rowling?

There is no room for confusion here. This is not a problem at the publisher or retailer end. This is a problem at the BookStat end.

But it gets worse. A closer inspection reveals that JK Rowling is not only listed twice, but at position #7 she is attributed 76 titles and at position #17 her title count rises to 93.

At which point we might ask, is 76 correct, or is 93 correct, or do we need to count them both and make 169, or are none of these numbers actually accurate? After all, when we turn to JK Rowling in the ebook list she has 220 titles!

Data Guy declined to comment on that.

I concluded that 2018 fisking of Abbassi’s Bookstat numbers (remember, these are all numbers Abbassi himself made public) by questioning how Bookstat used BISAC categories for accounting purposes and why the resulting numbers cannot be relied upon because Bookstat was clearly double-counting title sales.

According to Data Guy’s January 2018 report Literature & Fiction was the single biggest selling genre in ebooks in the nine months measured. Some 70 million unit ebooks valued at $330 million were recorded.

Below that comes Mystery Thriller & Suspense with 35 million unit sales and $187m in revenue, while Romance comes in with a lower dollar value of $160 million for more unit sales – 50 million.

But hold on, since when was Literature & Fiction the biggest selling genre? Come to that, since when was Literature & Fiction a genre at all? It’s a catch-all category for most adult fiction, to separate it from children’s and YA titles and non-fiction.

Seriously, who amongst us has ever listed a book just as Literature & Fiction? The retailers and aggregators might put our titles into that category automatically, in order to reach relevant sub-categories like mystery, thriller, suspense, romance, sci-fi or whatever, but Literature & Fiction as a genre is utterly meaningless.

What genre do you write, friend? Oh I write literature and fiction. It outsells everything else. It’s the only genre to be in.

Yet here is Data Guy telling us Literature & Fiction sold twenty million more units than romance and twice as many as mystery, suspense and thrillers.

To confuse matters still further, Data Guy’s January 2018 Author Earnings Report – using the data Big Pub is forking out big money for – has as many issues with genre as it does with title counts.

In the list of top audio book genres Data Guy tells us that #1 is, surprise, surprise, Literature & Fiction.

At #2? Well, there’s another top-selling genre we’ve all been ignoring and missing out on millions of sales from. Fiction & Literature. No, seriously.

At #3 is Mystery, Thriller and Suspense. Not, of course, to be confused with Mysteries and Thrillers at #10. Go figure.

Bizarrely the audiobook sales for Literature & Fiction and Fiction & Literature combined are equal to the combined sales of Mystery, Thriller, Suspense and Science Fiction and Fantasy and Romance and Biography and Memoir.

Bearing in mind this is just from the sliver of data we’ve been allowed to see, it begs the question what other inconsistencies are lurking in the full BookStat report behind the $10 million turnover paywall.

Almost two years on and we have no reason to assume Bookstat has improved its act, as Abbassi is careful not to make numbers widely available, while having taken down the historical Author Earnings and Bookstat data that was publicly available on the Author Earnings website, to prevent unhelpful comparisons with what little data he does put out.

Which begs the question if we can take seriously the Bookstat charts being provided to The Bookseller this year.

There’s no reason to doubt the dominance of A-Pub in the Amazon charts, and given the lack of ebook competition in the UK it’s reasonable to assume these will dominate the nation’s ebook chart too. But the numbers?

Given the track record of Bookstat as outlined above, and the need to hide the Author Earnings data, there’s every reason to remain wary.