Facebook Inc.FB -0.24% is testing technology that would greatly expand the scope of data that it collects about its users, the head of the company’s analytics group said Tuesday.
The social network may start collecting data on minute user interactions with its content, such as how long a user’s cursor hovers over a certain part of its website, or whether a user’s newsfeed is visible at a given moment on the screen of his or her mobile phone, Facebook analytics chief Ken Rudin said Tuesday during an interview.
Facebook’s Ken Rudin
Mr. Rudin said the captured information could be added to a data analytics warehouse that is available for use throughout the company for an endless range of purposes–from product development to more precise targeting of advertising.
Facebook collects two kinds of data, demographic and behavioral. The demographic data—such as where a user lives or went to school—documents a user’s life beyond the network. The behavioral data—such as one’s circle of Facebook friends, or “likes”—is captured in real time on the network itself. The ongoing tests would greatly expand the behavioral data that is collected, according to Mr. Rudin. The tests are ongoing and part of a broader technology testing program, but Facebook should know within months whether it makes sense to incorporate the new data collection into the business, he said
New types of data Facebook may collect include “did your cursor hover over that ad … and was the newsfeed in a viewable area,” Mr. Rudin said. “It is a never-ending phase. I can’t promise that it will roll out. We probably will know in a couple of months,” said Mr. Rudin, a Silicon Valley veteran who arrived at Facebook in April 2012 from Zynga Inc.ZNGA -0.31%, where he was vice president of analytics and platform technologies.
As the head of analytics, Mr. Rudin is preparing the company’s infrastructure for a massive increase in the volume of its data.
Facebook isn’t the first company to contemplate recording such activity. Shutterstock Inc.SSTK +0.11%, a marketplace for digital images, records literally everything that its users do on the site. Shutterstock uses the open-source Hadoop distributed file system to analyze data such as where visitors to the site place their cursors and how long they hover over an image before they make a purchase. “Today, we are looking at every move a user makes, in order to optimize the Shutterstock experience….All these new technologies can process that,” Shutterstock founder and CEO Jon Oringer told the Wall Street Journal in March.
Facebook also is a major user of Hadoop, an open-source framework that is used to store large amounts of data on clusters of inexpensive machines. Facebook designs its own hardware to store its massive data analytics warehouse, which has grown 4,000 times during the last four years to a current level of 300 petabytes. The company uses a modified version of Hadoop to manage its data, according to Mr. Rudin. There are additional software layers on top of Hadoop, which rank the value of data and make sure it is accessible.
The data in the analytics warehouse—which is separate from the company’s user data, the volume of which has not been disclosed—is used in the targeting of advertising. As the company captures more data, it can help marketers target their advertising more effectively—assuming, of course, that the data is accessible.
“Instead of a warehouse of data, you can end up with a junkyard of data,” said Mr. Rudin, who spoke to CIO Journal during a break at the Strata and Hadoop World Conference in New York. He said that he has led a project to index that data, essentially creating an internal search engine for the analytics warehouse.
October 30, 2013, 7:15 AM ET
By STEVE ROSENBUSH
Copyright ©2014 Dow Jones & Company, Inc