Why ad networks should optimize for precision and not recall

When I was working on my PhD living in Austin, I owned several motorcycles, and spent lots of time online researching parts, upgrades, repairs, etc. on sites like svrider.com and vfrworld.com. Without sites like these, when I had a problem with my motorcycle, I would have had to read the shop manual, go to the parts store, talk to a mechanic, call friends to ask for help, etc. I still did these things on occasion, but online resources made information more immediately accessible, and made my research much more efficient. This sort of information availability is one of the defining disruptions of the web.And not surprisingly, deep content is really my favorite “part” of the web. But internet ads, especially those on many of the sites I frequent, just don’t get my attention. Many ad networks today claim to have awesome semantic targeting technology that can develop complex models of interpreting content in order to place the most relevant ad. But if a forum post is discussing steel brake lines for a motrcycle, and the ad network only has a generic ad for an auto parts store, then the placement can only be so relevant, regardless of technology. Some of the best content on the web is quite deep, but most ad inventory generally lacks required specificity. The reason why ad networks today aren’t able to get my attention has nothing to do with their technology. It has 100% due to lack of inventory.

If I’m reading content about motorcycle brakes, an effective ad should directly address the subject of motorcycle brakes. Perhaps a book on motorcycle brake repair from Amazon, or brake lines for my Suzuki sold through an online merchant.  Semantic targeting technology may be able to make inferences like “Suzuki is a type of motorcycle manufacturer”, and then show a banner ad to buy a new Yamaha bike (with a low APR!). But I can only see so many of these without turning a blind eye, and most of the time I’m not in the market for these sorts of offers anyway.

Highly specific ad inventory has greater potential for high precision: an ad for a book on motorcycle brake maintenance won’t be relevant to every page on a motorcycle site, but it will be highly relevant to some pages. Generic ads have higher potential for recall: an ad for a new Yamaha motorcycle has relevance to potentially any page on a site about motorcycles.  Of course, my argument here is that this sort of relevance just isn’t good enough to capture my attention.

My previous startup focused on this problem directly. We focused on the retail sector, and created thousands or even millions of ads for each advertiser. We created these ads automatically via a content extraction algorithm that was able to recognize and parse products from the web. This problem of automatic ad creation was actually much harder than the ad matching itself. Automatic ad creation enabled us to create a database of tens of millions of products. Our ad targeting was very precise, but not because of any sort of next generation semantic technology. We used well-tuned keyword matching algorithms and leveraged the scale and diversity of our ad database.

In the end, Adtuitive never reached scale to disrupt the ad industry. We were presented with an awesome opportunity to apply ourselves and our technology to work on problems at Etsy, where I now lead an awesome team of engineers and data scientists working on search, advertising, and personalization.

However, I still see huge opportunities here in this space.