Big Data and the 2012 Presidential Election

Video from our talk on the main stage at the recent IAB Ad Ops Summit in NYC.  Special thanks to Elaine Harvey at Resonate Insights and the IAB.

 

 

Posted in technology | Tagged , , | Leave a comment

Real-Time Bidding and the 85%*

*of people that don’t delete cookies.

Online advertising real-time bidding (RTB) made the New York Times magazine section on Sunday, where it will be read by around 1.6 million people.  Expect lots of discussion around the horrors of online privacy invasion as a result.

It’s a good article, and worth the read.  And while we all believe online privacy is important, people seem to have forgotten the days when most information wasn’t free.  Want to read the news, or check stock prices?  Buy a newspaper.  Want to keep up with current events without buying a paper or magazine?  Then tune in to watch the news being drip fed to you by TV anchors.  Ugh.  That was not that long ago, people.

Most internet content is free for one reason.  Someone is willing to pay for it.  And that someone is the advertising community.

In the first half of 2012, internet advertising revenues climbed to an all-time high of $17 billion in the US alone, according to the Interactive Advertising Bureau.  That is serious business.  Without this revenue stream, it’s unlikely that we’re getting all this “free” internet content.

What does this have to do with RTB and targeting?  Data, cookies, and targeting is the lubricant that makes RTB possible.  Consider that less than one in a thousand people will click on a typical banner ad.  So the more accurately an advertiser can target a user with an ad, the more likely it is that the ad will resonate with that user (whether it’s to generate a click, or to “lift” their “brand,” or whatever it is the advertiser is looking to achieve with their advertising campaign), and the more that the advertiser is willing to pay to serve the ad as a result.  That’s a big part of what RTB does.

The existence of a real time marketplace that allows advertisers to finely segment the user, page, location, time of day, and myriad other criteria, is driving ad revenues up.  Take away the ability to target users based on their interests, and we can expect ad campaign metrics to decrease, and ad revenues to follow suit, taking a toll on the freshness and quality of online content that publishers can afford to make available.

I would argue that targeting is healthy for the future of the internet.  But at the same time no one wants to see ads that violate their privacy.

So what to do?

1 - First, if seeing targeted ads is really so horrid for you, delete your cookies.  According to the NY Times article, “fewer than 15 percent of Internet users turn off or limit their cookies, according to surveys.”  When you delete your cookies, you completely delete your surfing trail.

Here’s an interesting test:  Go to BlueKai’s website and look at what it knows about you (per the NY Times article).  Wow, that’s alot of information.  Now delete your cookies and try it again.  Voila!  You can even set your browser preferences to delete cookies every time you quit a browsing session automatically.  Don’t forget to delete your Flash Cookies too.

Seems like a pretty simple solution to all of this brouhaha around online privacy.

One might argue that if everyone regularly deleted their cookies, that the advertising industry would suffer, and online content would suffer from a funding shortfall, just like if we completely blocked the ability to target users.  But that’s probably not the case.  Let’s assume that starting tomorrow, cookies are allowed to live for no longer than 3 days.  The advertising technology industry is filled with super smart analyst types that can figure out ways to deal with this new snag.  We can accurately discern patterns of behavior and interests even from cookie lifetimes of less than 3 days.

So when I start browsing again with a fresh cookie, I’m anonymous but I can still be targeted effectively based on aggregated, anonymous targeting algorithms from pools of behavior that are based on 3 days of browsing activity.  That means that I’m seeing ads that are reasonably relevant for me, and the advertisers are still relatively happy, but the shoes I looked at 4 days ago will stop following me around the internet and creeping me out.

2 - Change DNT as it stands today.  The current Do Not Track initiative is likely to go nowhere.  Microsoft turns on the Do Not Track header by default in its new Internet Explorer 10 browser, which tells advertisers not to serve any targeted ads to that browser.  So what does the advertising industry do?  They say that any time they see an IE10 browser with DNT:1, that it was set by default by the browser and not intentionally by the user, and they ignore the setting. 

Instead of DNT as it currently exists, why not set cookie lifetimes to 3 days by default?  Seems like a workable compromise.  Then the data scientists can spin off and work on high accuracy targeting algorithms that find ways to optimize the performance of ads using three days of browsing history as the data sets for the analytics.

3 – Institute additional, stringent rules that limit the ads that can be served.  Whether it’s a medical condition, extramarital activities, whatever, we should create and enforce strict limits regarding what’s OK to display to an online user. Targeting me with a Viagra ad because I read about its side effects, or worse, because I typed ED into Google and hit enter before I could type Sullivan right after it, should definitely be off limits.  Seriously.

It’s naive to think that targeted online advertising is going away.  Ask someone if they want privacy, they’ll say yes.  But tell them that it comes at the cost of the internet no longer being free, and you’ll get a very different answer.

Posted in technology | Tagged , , | Leave a comment

Is It Really Different This Time?

I’m taking notice as many experts (and not just Nouriel Roubini) are predicting impending doom and gloom in the equity markets because of the ongoing problems in the EU, the huge and still expanding US debt levels, the hangover from the Great Recession, the effects of entitlement spending, expiration of the Bush tax cuts, Syria, Iran, and whatever else we can throw on the pile.

My concern is not where the markets will be be next month, but instead whether we’re in for another decade or more of flat – or worse – declining share prices.  So I thought it could be illustrative to consider equity price levels historically.  A wide range of disruptive events have occured in the past (world wars, recessions, high tax rates, low tax rates, dotcom insanity, …), so if there is a lasting/permanent effect of events on the markets, it should become apparent.

I started by looking at the S&P 500 closing prices going back to 1950.  I found that it increased at a remarkably steady CAGR of 7%.

Using the historical actuals to extrapolate into the future, the S&P 500 index can be expected to cross the 2,000 price level somewhere in mid 2017, less than 5 years from today (for a healthy 42% gain over current levels).

Extrapolating historical S&P 500 data into the future

How sensitive to various events are the equity markets’ long term prospects?  While there is a zero chance of accurately predicting where the market will be next month, can we predict where it will be in 5 years with any degree of confidence?  To test, I removed large chunks of the market data from the analysis to see what would happen.  For example, from 1995-2003 we saw a dramatic bubble/bust cycle.  What happens if we remove that time period (in other words the Y2K and dotcom booms, and the subsequent bust, never happened) from the analysis?

Not much, as it turns out.  Assuming 1995-2003 never happened, the S&P 500 crosses the 2,000 level only 6-8 months later than in the previous analysis.

I re-ran the analysis with various time periods removed and found pretty much the same thing.  Here’s the data assuming the Great Recession that began in 2008 never happened (if only it were that easy):

If Wall Street had not blown up the world in 2008, we could have expected the S&P to hit 2,000 around the beginning of 2014 (thank you, Wall Street, for inventing the synthetic CDO).

Long term, the historical data suggests that the S&P 500 will continue to grow at a 7% CAGR, and will cross the 2,000 mark somewhere around 2017, give or take.

The only scenario I can think of that would invalidate this historical trend is if the equity markets could no longer be viewed as a representative proxy for the economy; for example, if large organizations like Apple and ExxonMobil were to continue to operate, but no longer exist as public corporations.  Anything else, like staggering debt, or an incompetent Congress, or funding expensive wars and social programs, or financiers putting their own interests ahead of their clients’, are all things that the markets have seen before and are priced into the historical data.

I have no idea where the market is going short term.  But analyzing the long term data helps to drown out the short term noise.  I am staying long equities.  Unless a private equity firm takes Apple private.

Posted in stock market | Tagged | Leave a comment

5 Good Reasons Not To Buy Facebook Stock

1.  Lockup expiration

Yesterday, 271 million Facebook shares were freed from lockup restrictions, with an additional 1.6 billion shares expected to be released from lockup restrictions over the next nine months.  For context, prior to yesterday’s expiration, only 500 million shares could be freely traded.  Over the short term, we should expect sellers to far outweigh buyers.  Yesterday (lockup expiration day) there was a 6% drop in FB’s share price with 135 million shares traded, compared with 30 million shares on an average day.

2.   Market saturation

Facebook claims it has more than 900 million active monthly users. That’s almost half of all Internet users on the planet.  Is it more likely 5 years from now that they’ll have 500 million users or 2 billion users (think MySpace)?  Even if FB can grow its active user base to 1.5B users, any meaningful revenue growth will have to come from new ways to monetize the users they already have, not from attracting new users.

3.  Facebook’s addressable online advertising market is limited to “display”

So with almost half the total worldwide Internet population already active on Facebook, where can FB’s growth come from?

Consider that Google boasts an Average Revenue Per User (ARPU) of $29 (almost all of which is from online advertising), while Facebook’s ARPU is only around $4.

Google has 45% of the total (US) online ad revenue; Facebook has 6%.

So it’s easy to assume that Facebook can appreciably grow its revenues over time simply by doing a better job monetizing its existing user base with advertising.

But that’s not the case.

Display-related online advertising accounts for just 36% of the total online advertising spend, and Facebook will be hard pressed to expand its revenue beyond this segment.  Search?  It’s hard to imagine a scenario in which users are relaxing their privacy settings to allow their content to appear in search-engine search results, and even harder to imagine an advertiser paying for an impression tied to a search result returning some user-generated content on Facebook.

Facebook already has 19% of the total worldwide display-related ad spend.  How much of the total display advertising pie can Facebook expect to  command?  Probably less than what Google can, which is about where it is now.

4.  Facebook won’t even dominate the “display” segment

Why?  It can be summed up in two words:  Brand Risk.  A brand manager doesn’t want his company’s brand identity damaged on his watch.  GM’s decision to suspend its Facebook campaigns is indicative of the concern among (certain types of) brands that its brand identity will appear adjacent to inappropriate content.  Companies like AdSafe and DoubleVerify (and others) are already making good money by providing marketers with safety nets that block their ads from showing up in places that could damage their brands.  So it’s reasonable to believe that skittish brand managers will continue keep their ads off Facebook, preventing the company from dominating the display segment of the online advertising market the way that Google dominates search advertising.

How else can Facebook generate revenue?  That’s a tough question.  Can Facebook charge for its service like LinkedIn does?  Can it build and monetize apps like Zynga does?  Can it charge app developers a percentage to distribute apps like Apple does?  And can any of these potentially user-annoying strategies appreciably move the needle?  Even after more than 10 years in business Google is still generating 96% of its revenue from advertising.

One option is for Facebook to sell the “data” that they have on its users to advertisers.  While a potentially lucrative revenue stream, it has the very real ability to creep out its users to the point where they abandon the site.  Also of concern is the W3C’s Do Not Track initiative, which could make it alot more difficult for Facebook to collect and use targeting data on its users in the future.

5.  Inconsistent financials

Even if you believe in Facebook’s long term prospects, there’s near-term risk not only from the effect of lockup expirations, but also from the company’s inability to post consistent numbers even for such a short period since the company has been public.

Revenues fell from Q4 2011 to Q1 2012, sequential (quarterly) revenue growth rates are slowing, and the company posted a net loss of $157 million in Q2 2012.  What does that say about management’s ability to manage the business to Wall Street’s expectations going forward?

So what is Facebook really worth?

Even at a sub-$20 per share price, Facebook boasts a $42 billion market cap and a whopping multiple of 110.

Assuming the display advertising market continues to grow at 15- 20%, and Facebook can successfully grow its share of the worldwide display market from 17% to 25% over 8 years, the company will generate $16 billion in revenue in 2018, with a CAGR of 20% over the period.

For comparison, Google, which generated $38B in revenue in 2011 with 29% growth, has a $220B market cap.  Since their growth rates are comparable, and since both companies have net operating margins around 27%, and assuming that Facebook will attain/maintain the same “market leader” premium Google enjoys, one way to value Facebook stock is to apply the same market cap : revenue ratio as Google.  Doing so values the stock at $15 today, growing to $42 in 2018.

A less optimistic view of Facebook’s future prospects is to compare it to Yahoo, whose average revenue per user is $8 (to Google’s $29 and Facebook’s $4).  This comparison values FB at just $9 today.

Posted in stock market | Tagged | 3 Comments

In Search of Excellence by Moonwalking with Einstein

I just finished reading Moonwalking With Einstein:  The Art and Science of Remembering Everything, by Joshua Foer.  It’s a very interesting read about the neuroscience of memory and the memory techniques that have been used throughout the ages to perform seemingly superhuman feats of memorization (which apparently can be learned and used by just about anyone).

I found the following passage illustrative, and worth excerpting here.  It rings true, and was - at least for me - the highlight of the book…

“What separates experts from the rest of us is that they tend to engage in a very directed, highly focused routine, which Ericson has labeled ‘deliberate practice.’  Having studied the best of the best in many different fields, he has found that top achievers tend to follow the same general pattern of development.  They develop strategies for consciously keeping out of the autonomous stage while they practice by doing three things:  focusing on their technique, staying goal-oriented, and getting constant and immediate feedback on their performance.  In other words, they force themselves to stay in the ‘cognitive phase.’

Amateur musicians, for example, are more likely to spend their practice time playing music, whereas pros are more likely to work through tedious exercises or focus on specific, difficult parts of pieces.  The best ice skaters spend more of their practice time trying jumps that they land less often, while lesser skaters work more on jumps they’ve already mastered.  Deliberate practice, by its nature, must be hard.

When you want to get good at something, how you spend your time practicing is far more important than the amount of time you spend.  In fact, in every domain of expertise that’s been rigorously examined, from chess to violin to basketball, studies have found that the number of years one has been doing something correlates only weakly with the level of performance.  My dad may consider putting into a tin cup in his basement a good form of practice, but unless he’s consciously challenging himself and monitoring his performance – reviewing, responding, rethinking, rejiggering – it’s never going to make him appreciably better.  Regular practice simply isn’t enough…

The best chess players follow a similar strategy.  They will often spend several hours a day replaying the games of grand masters one move at a time, trying to understand the expert’s thinking at each step.  Indeed, the single best predictor of an individual’s chess skill is not the amount of chess he’s played against opponents, but rather the amount of time he’s spent sitting alone working through old games.

The secret to improving at a skill is to retain some degree of conscious control over it while practicing – to force oneself to stay out of autopilot.  With typing, it’s relatively easy to get past the OK plateau.  Psychologiests have discovered that the most efficient method is to force yourself to type faster than feels comfortable, and to allow yourself to make mistakes.  In one noted experiment, typists were repeatedly flashed words 10 to 15 percent faster than their fingers were able to translate them onto the keyboard.  At first they weren’t able to keep up, but over a period of days they figured out the obstacles that were slowing them down, and overcame them, and then continued to type at the faster speed.  By bringing typing out of the autonomous stage and back under their conscious control, they had conquered the OK plateau.”

Posted in marketing strategy | Tagged | Leave a comment

Two words of advice for the “Gang of 12″ – Predictive Analytics

It’s now up to the Gang of 12 super-committee created by the debt deal to come up with a plan to trim the deficit.  If history is any guide, we can expect a lot of rhetoric and emotionally charged debate around the Bush tax cuts, entitlement spending, big government, shared sacrifice and the like, and a disappointingly poor outcome as the result.

This is in stark contrast to the way industry handles complex, multi-variate, big data predictive analytics problems, which is essentially what we’re talking about here.

The government is spending more than it takes in, so it’s got to spend less, and/or take in more, while at the same time, counter intuitively, stimulating growth.  Raise taxes too much, or on the wrong people, and we stifle growth.  Cut spending too much, or for the wrong programs, and we get the same result.

At the same time, do we know even how much debt is acceptable, even healthy, given historically low interest rates (the August 12th 10 year treasury auction produced a 2.29% yield, the lowest in history)?

Hard problem.

Yet I see hard problems like these being successfully addressed by industry every day.  Investment banks and portfolio managers are performing predictive analytics to model a range of micro- and macro- economic scenarios on risk and performance.  Pharmaceutical companies can simulate synergies among various compounds to model the effectiveness of potential vaccines and treatments.  Online companies are crunching massive amounts of online data to probe what-if scenarios to intelligently segment and target users.

Recent advances in data processing and analytics technologies, including Hadoop, in-database analytics, database appliances, etc. are all being leveraged to apply powerful analysis and simulation techniques to address the most complex multi-variate predictive analytics problems – problems that are at least as hard as effectively managing our country’s finances.  The left-right rhetoric-based, emotionally-charged approach to solving this country’s financial problems is a far cry from the sophisticated and highly effective approaches in use today throughout industry.

My advice to the “Gang of 12” – please hire some super-smart data analysts and do some math.

Posted in technology | Tagged , | Leave a comment

Google Buying Motorola’s Handset Division for $12.5 Billion

Today Google takes a page from Apple’s playbook, announcing its intention to acquire Motorola’s handset division (MMI) for $12.5 billion.  The resulting merger should provide Google with plenty of opportunities to streamline the behavior of the hardware and OS for the benefit of consumers.

At the same time, it will put Google’s “Don’t Be Evil” code of conduct to the test.  For example, in the past Apple has used their hardware/software dominance to track users’ location without their consent.  And Apple still disables third-party cookies by default to make it that much harder for competitors to play in the targeting / advertising ecosystem.  The opportunities for Google’s mobile environment to become less open, a la Apple, now that it own both the OS and the device, are significant.

The Android network currently includes 39 device manufacturers.  Andy Rubin, Senior Vice President of Mobile at Google, is quoted in today’s press release as saying, “We expect that this combination will enable us to break new ground for the Android ecosystem. However, our vision for Android is unchanged and Google remains firmly committed to Android as an open platform and a vibrant open source community. We will continue to work with all of our valued Android partners to develop and distribute innovative Android-powered devices.”

While the prospects for the players in the Android ecosystem are less certain, Android consumers and Google shareholders should be pleased with the news.

You can read Larry Page’s view on the acquisition here.

Posted in marketing strategy, technology | Tagged | Leave a comment

Why Is Apple Tracking Your Location?

It’s all part of their master plan. And it appears to be working.

iPad 2

 

 

 

 

 

 

 

 

 

 

Apple just announced another blockbuster quarter. Almost $25B in revenue (up 83% Q/Q), 41% gross margins, earnings are up 95%, Mac shipments are up 28%, iPhone shipments are up 113%, and another 4.7 million iPads went out the door in the quarter ended March 31.

But look closely, and you’ll see that Apple is quietly turning itself into a different kind of company:   a company that views content as a revenue driver in its own right, not just a way to move (expensive) boxes and make them sticky. iTunes, for example, , now accounts for more than 5% of total revenue, and with book downloads to the iPad growing rapidly, iTunes appears to be becoming a significant profit center.

Revenue from apps are becoming a meaningful source of revenue as well. In January, the much heralded 10 billionth app was downloaded from the App Store, with 30% of all app revenues flowing directly to Apple (more than $1 billion to date).

But perhaps the most interesting element of Apple’s growth strategy might be its disruptive approach to the rich online advertising market. Last year, total online advertising spend exceeded $26 billion. And while only $1B was spent on mobile, the projected growth rates for mobile advertising are truly astounding. Forecasts are all over the map, but Gartner (as good a proxy as any) predicts mobile ad spend will grow to $3.3B in 2011, and to more than $20B by 2015.

Why such massive growth? Because mobile (and especially location-based) ads are tremendously more effective than conventional online ads.  According to the Mobile Marketing Association, nearly 50% of users who are shown a location-aware ad on a mobile device will take some action, compared with typical clickthrough rates on web based banner ads of 0.2 – 0.3%. Put another way, mobile location based ads are roughly 250 times more effective than conventional ads. That’s something that advertisers will clearly pay for.

But the online advertising ecosystem has already become pretty crowded.

There’s not much room in here for Apple to garner a significant market share. They’re boxed out.

So Apple apparently has figured out a way to change the game (again). They created and marketed “apps.” In the browser, the dollars flow from advertiser (Infiniti) to agency (WPP Group) to the myriad data optimization / DSP / RTB / analytics / yield optimization / etc. / etc. companies, and finally to the publisher (www.cnn.com). There are a lot of hands in the till.

But with apps and iAd, there is no ecosystem. There are the app developers, and there’s Apple. That’s it. 60% of advertising revenues go to the app developer, and fully 40% of the advertising spend flows directly to Apple. With a projected $20 billion total mobile advertising market by 2015, this is a big opportunity for Apple.

What Apple needs for its plan to succeed is massive adoption of apps.

Enter the Apple marketing machine. There are now more than 100 million Apple mobile devices in the market, and Apple has the “it” device of the moment:  a tablet that’s on fire.

And so are apps.  They’re so cool!

It’s hard to believe a major brand like the New York Times devotes two page spreads to promote its iPad app. The Times already has a beautiful website and they’re successfully monetizing it with expensive advertising and subscriptions. I don’t think they’re promoting their iPad app because it’s a huge moneymaker (it’s free), or because it provides a significantly better experience than the physical paper or their website (it’s doesn’t), or even for the ad revenues (are they making more from their app advertising than they make from ads on their website?). Apple’s marketing machine has created a movement:  they’ve made it cool to have an app. So cool, in fact, that the New York Times (and every other brand, seemingly) is devoting serious real estate to promoting theirs.

Kudos (once again) to Apple’s marketing machine.

Apps are Apple’s iAd delivery mechanism. Apple is building yet another closed environment – this time, for advertising.  And now we find out that Apple has been surreptitiously monitoring and recording iPad and iPhone users’ geolocation, no doubt so that they can monetize users’ locations through geo-targeted ads (increasing ad effectiveness by 250 times or so). Apple also is syncing the location file from the mobile devices to the desktop devices, arguably to serve the same type of geo-targeted ads to users on their desktops and laptops (based on the journeys of their iPhones and iPads), further increasing revenues.

As of today, Apple has a market cap of $323 billion, making it the second largest U.S. company, behind only Exxon Mobil (XOM) at $428B.

Is Apple too expensive?

Apple has TTM revenues of $76B and $17B in net income, compared with XOM’s $342B in revenue and $30B in net income. There is no doubt that Apple’s stock price factors in some pretty aggressive growth assumptions. But Apple is executing extremely well (see latest 10-Q). And recent events (like being caught geo-targeting all their mobile users) would suggest they are also executing on their master plan to capture a significant share of the rapidly growing mobile advertising market.

It appears Apple is attempting to recast the online advertising space in much the same way it did with the PC market, and the music industry, and mobile phone market, and portable music player market.

Bet against them at your own risk.

Posted in marketing strategy, technology | Tagged | Leave a comment

Better Real-time Behavioral Targeting for Online Advertising

The enabling technologies exist for online advertisers to employ proven, sophisticated, predictive behavioral targeting techniques in their advertising campaigns. Yet only a small minority of online advertising – primarily RTB platforms and the more sophisticated exchanges – are leveraging these techniques, and even the most sophisticated of these pale in comparison to the level of analytic targeting that traditional marketers have been using for decades.

It’s quite surprising.

The quantitative benefits of classical database marketing techniques are well understood. By targeting the segments most likely to respond to an offer or least likely to default on a credit card or loan, marketers have been dramatically increasing returns on investment for decades.

These same fundamental targeting techniques have been proven to increase click-through and conversion rates for online advertising initiatives as well.  And online campaigns can often be even more targeted than offline campaigns, due to the rich online behavioral data that can be stored and mined, and the opportunities for dynamic personalization of the creative (ad) that are not possible in the offline world.

Yet, even so, behavioral targeting is far less common for online advertising initiatives than for conventional marketing initiatives. The result of this lack of targeting is that much of the spend on online advertising is wasted. eMarketer reports that only 14% of online display ads are targeted, and comScore estimates that as much as 80% of impressions are shown to the wrong audience.

Studies prove that database marketing techniques can dramatically reduce this waste by targeting impressions to the right audience at the right time and in the right place. One research study that analyzed 7 days worth of advertising click-through data logs from a commercial search engine found an increase in click-through rates (CTRs) of 670% with advertising that segmented users according to their behavior through the use of classical clustering algorithms. The research also found that by utilizing advanced user representation and user segmentation algorithms, CTRs can be further improved to more than 1,000% . And, an advertising data company reports that by utilizing simple re-targeting strategies they successfully increased click-through rates by 130%, and by also using customized re-targeting creative, total return on advertising spend increased from 114% (with static ads) to 1,459% .

So, why isn’t behavioral targeting more prevalent with online advertising? Part of the reason is that the default ad serving technologies are limited in the amount of behavioral targeting they can provide. Even so, all of the enabling technology is readily available to allow sophisticated behavioral targeting for display and rich media advertising.

There exists a rich set of criteria and data that can be used to customize creative, and to segment and target users, including:

• Cookie values: Cookies allow marketers to record visitor behavior and actions, and identify previous visitors for re-targeting. By writing relevant session details to a cookie, then reading the cookie values when users return to the site, marketers can easily re-target users. What’s more, by writing the cookie data to log files which are parsed and loaded into the marketers’ data warehouse, marketers can build sophisticated statistical marketing models to drive their targeting efforts, and connect a visitor by cookie value to its rich store of information on the user in their database.

• Geo-location data: This data includes the location of the visitor (including mobile geo-location for mobile device users), as well as connection type, ISP, and other geo-location specific information. Geo-location based targeting can have significant impact on a campaign: One study reports that 50% of users who are shown a location-aware ad on a mobile device will take some action, compared with a .2% CTR for a conventional banner ad on a website .

• HTTP header values, request URL (including query argument values), form field contents, language preference, the browser type and version and other settings can all be used as input to the statistical models, for targeting, and as criteria for customizing the creative.

The appropriate creative can be programmatically selected by the marketer to deliver highly contextual, customized advertising in response to each request, based on the attributes of the request, the information in the cookie file, and the marketers’ targeting algorithms. What’s more, creative can be programmatically customized in real-time – for example by dynamically adjusting the JavaScript in the ad – based on the data associated with the request and the targeting algorithms, further targeting each display and increasing CTRs.

One last piece of technology is needed to complete the picture: Pixel Tracking.

Internet browsers prevent scripts served from one domain from accessing data on another domain. This technical limitation would prevent an advertiser from performing most activities required in order to execute behavioral targeting campaigns. Fortunately, there is a common workaround that involves placing a 1×1 clear pixel image (or any other image) on specific web pages (or ads) that an advertiser wishes to track. Whenever a user renders a page, views an ad, plays a clip or triggers some other event, the client application generates a request to the advertiser’s server enabling the server to take notice of the activity. As the user agent (such as a browser or rich media player) makes the request it can pass along information about the user, as well as specific data (such as cookie values, language preference, etc.) that can be discerned from the request data, and the transaction and its details can be recorded in log files. All data can be recorded without personally identifiable information about the user, providing an anonymous set of data.

By utilizing these technologies for pixel tracking, reading and setting cookie values, detecting user settings, and logging activity details to log files, marketers can build sophisticated behavioral targeting algorithms far more advanced than the current state of today’s norms, and successfully serve much more highly customized and targeted ads to each visitor.

While the most sophisticated platforms in the advertising eco-system are making progress in raising the bar on behavioral targeting, there seems to be some pretty compelling business opportunities that have yet to be tapped.

Posted in technology | Tagged | Leave a comment

Web Analytics: Overview, Options, and Technology Enablers

I am doing alot of web / clickstream analytics work for various companies with a wide range of sophistication and latency requirements,  so I’m thinking a lay of the land might be useful…

Access to timely clickstream data from a company’s website provides insight into online visitor behavior and patterns, which in turn enables companies to be more effective in myriad ways: with improved pricing, more effective campaigns and offers, better visitor segmentation and targeting, optimized website layout and workflow, and more.

Many companies use off-the-shelf web analytics products, which often make it relatively simple to monitor and analyze website activity, eliminating the need for significant IT efforts. There is no shortage of these web analytics products from which companies can choose.

As companies become more sophisticated in their web analytics requirements, though, they often need to augment the capabilities of these packaged web analytics products. Building and maintaining an internal clickstream data warehouse (CDW) enables these companies to manage, segment and report on the data in ways that the packaged products do not.

Until recently, building and maintaining a CDW has been prohibitive for most companies because of the volume and complexity of the warehoused data, as well as the volume and complexity of the raw source data files. However, recent advances in data warehousing technologies, such as columnar databases and data warehousing appliances that are designed to deliver very high levels of performance with massive and complex data sets, have made CDWs a practical option for many companies.

In addition to requirements for high performance OLAP database technologies, the raw clickstream source data, which often exists as huge, complex text files, must be parsed, structured, cleansed and loaded on a regular – sometimes daily – basis into the CDW before the CDW can provide value. This process often introduces additional cost, complexity and delay into the process. So the ability to process the raw clickstream data files in a rapid, cost-effective and scalable manner is also a critical component for any CDW initiative to be successful.

Traditional Web Analytics Products

Off-the-shelf software as a service (SaaS) web analytics products have been available for years. One major vendor reports that it has over 5,000 customers, and some of the major search engine companies offer popular web analytics products for free.

These products use a variety of underlying technologies (including page tagging, packet sniffing, and others) to collect a company’s website visitor data on the analytics vendors’ servers, and then provide each customer with capabilities to report on their specific website data. The primary benefit of using a SaaS web analytics product is that it requires much less effort than taking the data in-house and building a clickstream data warehouse. For many companies, these products provide sufficient levels of detail and flexibility in their reporting.

However, problems and limitations exist with these products, including:

• Lack of user-centric segmentation. Although useful for tracking activity in a page-centric manner, for many companies the specific information and segmentation that is available with off-the-shelf web analytics products does not satisfy their requirements for user-centric information. So, while customers are able to track the number of visitors, page views, and conversions on their website, they are unable to segment the data by user session to understand what a user does in a particular session, and to track a user’s activity across multiple sessions, for example.
• Historical analysis. The pages to be tracked and the tracking criteria must be defined in advance – it is impossible to report on new criteria from previous (historical) website activity. The new criteria must first be defined, and only subsequent activity can be tracked.
• Visitor tracking limitations. A limitation with technologies like page tagging is that not all user visits are tracked. For example, for visitors that have deleted cookies, or that don’t have Javascript enabled (such as on mobile devices), the visits are not recorded.
• Object tracking limitations. Activities with certain object types, such as PDF views and file downloads, are not tracked.
• Server tracking limitations. Since most of these products rely on code that is executed in the client, they cannot report on server responses, such as failed requests, response times, etc
• Confidentiality. Web analytics vendors store all of their customers’ web traffic data on their own servers. For some companies (and government organizations), the risk of their proprietary website analytics information being used without their knowledge or approval is not acceptable.

These issues, and others, are sufficient for some companies to build and maintain their own CDW from their clickstream data, and perform analytics on the data with sophisticated Business Intelligence (BI) software.

Clickstream Data Warehouse

A clickstream data warehouse is used to store all of the historical website activity in a structured format – typically on the company’s own servers – so that sophisticated queries and reports can be run on the data with BI software. Because of the large volume of clickstream data generated on a daily basis, and the large number of fields in the data, the prospect of implementing a CDW can be daunting. Even so, the business advantages of augmenting – or supplanting – packaged SaaS web analytics products with a CDW often provide sufficient justification for companies to undergo the initiative.

Benefits of implementing a CDW include:

• Flexibility. Since the company has all of the data, it can process, segment, and report on the data in whatever ways it chooses. For example, the ability to segment the data into unique user sessions and to combine multiple visits of a particular user over time provides rich insight into customer value.
• Combining multiple touchpoints. Combining a customer’s clickstream activity with data from customer support, procurement, POS, and other operational systems provides companies with a more complete view of the customer, and allows for more precise customer scoring.
• Historical analysis. With a CDW, queries do not need to be pre-defined. Days, months, or years after the activity, an organization can ask new questions of the data that it did not initially think to ask.
• More powerful BI. Companies often use sophisticated BI software with their CDW, providing analytics capabilities far beyond what off-the-shelf web analytics products provide.
• Superior tracking. Web server logs have far fewer limitations tracking objects and visitors than other technologies such as page tagging. Web server logs capture activity regardless of the characteristics of the client, and include user activity with PDF files, file downloads, server response times, etc.
• Confidentiality. CDWs built from web server logs eliminates any risk that an analytics vendor will share the data, since all of the data remains inside the company (on the web servers and in the companies CDW).

Clickstream Source Data

Just as different SaaS web analytics vendors use different underlying technologies – such as page tagging or packet sniffing – to track clickstream data, a CDW can leverage various sources of clickstream data as input.

For example, companies that are already using a web analytics product often use their analytics vendors’ source data as input to their CDW. All of the major analytics vendors offer their customers access to their full source data via batch delivery services or APIs. This enables them to continue to leverage their prior investments in customizing the product, and augments many of the limitations of the SaaS offering.

Alternatively, a CDW can be built by processing the raw log files written by the web servers. Since the web server log files contain every transaction, these files provide more complete data to be mined, eliminating many of the limitations of client-side tracking technologies, such as the inability to track visitors running clients without javascript, for example.

Some companies choose to build CDWs by combining multiple data sources that use different types of clickstream data. Doing so enables companies to leverage the benefits of multiple underlying technologies – for example, by enriching the batch data files from their web analytics vendor with the web server log data.

Enabling Technologies

Until recently, the massive amount of computing resources required to effectively work with the clickstream data made CDW initiatives prohibitive for most companies. However, recent advances in data warehousing technologies, including massively parallel processing (MPP) architectures and columnar databases require less investment in hardware resources and deliver significantly more attractive price performance ratios than ever before. As a result, many of the newer, successful CDW initiatives rely on these high performance data warehousing technologies.

In addition to high performance data warehousing technology, another critical component of a successful CDW implementation is high performance data transformation technology that can parse, structure, and cleanse the raw clickstream source files to initially populate the CDW, and refresh the CDW on an ongoing basis. Since these source files are typically very large, complex, and require significant processing to extract the desired information from the files, look for highest performance transformation technologies.

Finally, some of the newer analytic database technologies provide extremely fast import capability, so that the data can be available for reporting and analytics minutes or even seconds after the actual event is logged, in some cases enabling behavioral targeting for in-session, programmatic control of workflow, content, offers, and advertising.

Clickstream data is a rich source of information for companies. While many web analytics products are available, limitations associated with these products often drive companies to undertake their own internal clickstream data warehouse initiatives. With a clickstream data warehouse in place, companies can segment individual user sessions, combine customers’ online activity with data from other operational systems, and overcome many other limitations of the SaaS web analytics products.

New, high performance data warehousing technologies are often used for these initiatives, due to the massive data volumes and the complexity and cardinality of the data. New data warehousing technologies are available to maximize the success of CDW initiatives. High performance data transformation technology that can parse, structure, and cleanse the raw clickstream source files to initially populate the CDW, and to refresh the CDW on a regular basis is also available.

These technologies make it simpler and more cost effective for companies to manage their clickstream data in-house than ever before, effectively raising the bar on the insight and benefits that can be obtained via clickstream analysis.

Some relevant links:
http://hadoop.apache.org/
http://archive.cloudera.com/cdh/3/flume/UserGuide/index.html
http://hadoopblog.blogspot.com/2009/06/hdfs-scribe-integration.html

Posted in technology | Tagged | Leave a comment