Statistical Analysis and Reporting as Applied to Unique Characteristics of Streamed Media
Abstract: The reporting and analysis of streamed media files varies greatly from the reporting and analysis of text and graphics file requests retrieved for display from web sites. Unique characteristics of audio and video files presented in real time over a condition of time raises new questions and challenges when attempting to understand user patterns and traffic reporting. In this short paper measurement systems used within the traditional media markets of radio and television are reviewed. Next, contrasts are drawn between statistical analysis and reporting for streaming media requests and current web site analysis tools. Finally, questions and comments are posed as to how Internet usage of both media and information through digital streaming will be characterized in the future.
Traditional Media Usage Models
The delivery of audio and video media over the Internet is recent. The delivery of radio and television signals into the home has a greater history, with both media delivery systems evolving into a number of business and communications models. It makes sense then to look at how these two differing media institutions measure the usage and trends of their respective markets and apply what is appropriate to the Internet world.
The two key players in their respective mediums are Arbitron and Nielsen for radio and television. From the onset the greatest difference in usage analysis is that both use sampling methods; however, with the web nearly complete coverage of user requests is achieved. In time, activity patterns across the web and even within enclosed networked systems using streaming media may find sampling to carry a greater advantage, though likely more limited in the scope of the larger set of complete statistical analysis and reporting programs.
Nielsen Media Research, (www.nielsenmedia.com) active in television ratings since 1950, extrapolates data for approximately 99 million households with televisions with a sampling of five thousand viewers. A black box keeps track of what is being viewed at any one time on the television, and is invisible to the viewer, not unlike a log file is invisible to a web audience. This is one of several methods used, including diaries during their "sweeps" periods. The requirements for the analysis is fairly simple, "provide an estimate of audience for just about every program that can be seen on TV". That is, how many are watching and at what time. Additionally, they keep track of commercials that have been on television and rank television programs in order of viewers.
The Arbitron Company (www.arbitron.com) is another well-known media research firm. Among other information services it measures radio audiences in local markets across the United States. The Arbitron system of radio ratings relies on diaries kept by users to create its reports. The number of listeners recruited into the program number over two million a year and represent over a million diaries. They then provide local markets with "software tools for analyzing its quantitative measures and integrating that information with local market consumer data." Important to the radio format is the need to understand total listening time to a particular frequency along with specific programs. Interestingly for radio, Internet- only radio stations are cropping up at a great rate on the Web, creating a greater urgency for accurate measurement along the business lines of the Arbitron rating system.
An example of an Internet broadcast of a traditional radio station shows some interesting numbers. In January of 1998, KIRO710 Newsradio (www.kiro710.com), a Seattle news and information radio station, accrued over 40,000 file requests of its broadcast over the Internet. By far the most listened-to segment was its live broadcast, which averaged slightly greater than 6.5 minutes per instance of listening time. There is little doubt that the numbers the 710 AM station achieved have grown since. Given that the KIRO radio broadcast is one of the most popular radio stations in the Puget Sound region, the numbers may not create much impact, but many small stations would be pleased with those figures.
Both Arbitron and Nielsen have made forays into ratings analysis and reporting in the Internet space, with the business emphasis concentrating on the availability of an independent, third-party authentication of traffic for publishers. Joining these two is a third company, the Auditing Bureau of Circulation whose roots are in the publishing industry.
Of final note before moving on to streamed media analysis, is that the aforementioned media services are supported by the business model of advertising and sponsorship within the medium itself. Such is the case with the current state of the web, as it clings to the commonalties of its predecessors. Speculation can be made that as the web extends itself as a commercial vehicle for direct purchasing, total audience size and length of engagement may be secondary to how much business is driven through the site.
Streamed Media Analysis and Reporting
As of the second half of 1998, there are only a few differences in the methods of tracking streamed media when compared to the types of information given by many web statistics packages that exist today.
The first two, player identification and error reporting, are functions that do exist in many statistical evaluation packages for the web. The difference is in the players and errors themselves. For streamed media, there is still and will continue to be for the time being, a media player battle. RealNetworks (www.real.com) has the most popular player and the lead in development and distribution. Microsoft (www.microsoft.com/netshow/) gives its player away for free and will eventually bundle its player into its operating system. Motorola is making its bid into streamed audio and Apples Quicktime has an installed base of users and is looking to get a push soon. There are many echoes here of the browser wars, still being fought, that suggest keeping an eye on version and player type is necessary when planning for media publishing and distribution.
Error reporting is a required segment of analysis of activity, with streamed media concentrating mostly on package loss, rebuffering instances and incorrect codec selection. As the servers become more aware of the client, codec selection errors will be reduced if not eliminated entirely. A primary goal of error reporting within the streamed media segment is to measure technical success and quality of system delivery.
It is the goal of many web sites to increase traffic over time. In furthering this pursuit, trend reporting is a requirement. Momentum and success in building audience or community cannot be effectively tracked without some means of measurement. If your objective is to gain critical mass of a user base or viewership then the most important trend is the upward swing of total audience. However, if your objective is to inform, inspire or teach you may be more concerned with the percentage of time engaged in a file or program as a function of that file or programs length. For instance, if you are providing inspirational messaging to a congregation of predetermined size, if they are leaving a half- hour sermon after ten minutes in May, and sticking around on average for 75% of the messaging in June, you are likely to be encouraged.
Until that time, file codec identification will be important when a publisher is determining its audience reach and requirements. There is a simple decision to be made. The higher the codec bitrate, the better the quality of audio and video. However, the higher the bitrate, the smaller the set of people who will be able to view or listen to the media files. Though technology is on the verge of handling this through identification of the proper codec per request, until that technology is fully developed and implemented, users who are connected to the Internet at less than the minimum required transfer speed are simply denied access to the file.
Where we truly break new ground in analyzing and reporting on this type of interactive behavior is with the duration requests. If a one-hour media file is available and ten thousand requests are made within the week for that file, the question is whether that translate into a more successful communication than a fifteen minute file with two hundred fifty requests in the same period. All it really speaks to is the attractiveness and popularity in the presentation of the file and the call to action in viewing it. If the average viewing time of both files was twelve minutes an argument can be made that the file with fewer requests was the far more successful communication.
The greatest difference that overrides all the similarities between web site statistical reporting and streamed media reporting is the way in which we interpret the information. How we look at the information compared to traditional media views? It is a hybrid between the log file analysis tools of the Internet such as Webtrends and Nettracker, and the traditional ratings services models of other mediums, with its own unique patterns and usage thrown in. The rationale behind evaluating your web traffic is more likely to be undertaken in an attempt at self improvement in quality of service, a tool for file management or even audience building rather than a mechanism for charging advertisement and sponsorship rates. However, when the latter is the main focus, the ability to report on advertisement delivery and resulting success in calls to action from the audience will be far more exacting than with current rich media broadcasting.
Within the space of the Internet, one of the first of many major differences in attempt to create information on users and user patterns is that the interactive basis of file delivery allows for a far more accurate collection of data for review. One can imagine if traffic and usage analysis were based on diaries as opposed to log file analysis and port sniffing. Adult material only and offshore gambling web sites would be invisible when in fact they exist in significant number.
Future Measurements and Characterizations
As the complexity and sophistication of digital streaming programs increase, the differences between the general web site statistical reporting and streamed media server statistical reporting will increase. This will be driven by file type differences, the means by which files are interwoven and the general objectives behind the sites themselves. We are moving quickly from single file delivery scenarios into multiple file delivery within the framework of a program or presentation. Streamed media will be more likely to represent audio, video and multimedia combinations than single file programming. Just of few of the many considerations for future measurement techniques and requirements are the following
In current programming, the idea of a single shot or feed is more an anomaly than commonality. Almost everything one sees or hears, down to thirty-second commercials or radio reports are actually edited pieces combining different short segments. Quality and style of production is dependent upon control of a multitude of elements in combination. Expanded further, a news program is made of many, separate reports concentrated into time segments, as are many of todays more popular reality programs. Required will be the ability to track total exposure time within various combinations of files and file types.
The Internets delivery system allows not only programming of any length and immediate streaming, but it will evolve to a point of orchestrating many separate files into a perceived single programmed event, with many overlays and file type combinations. Currently much of the analysis and reporting is captured on individual files. With the advent and upcoming adoption of SMIL-based presentations as well as aggregating several files into single programs, program-based as opposed to file-based statistical analysis must evolve. Most popular media file speaks to the appeal of the immediate content. Most popular media programs speak of its success in capturing audience and delivering information and advertisement
Given the growing popularity of live programming, differences in analysis may emerge comparing audience engagement within a live environment versus asynchronous media retrieval of combinations of files that create programs. As tools emerge which allow us to enter, exit and redirect streamed media within predetermined programming we will need to develop a system for measuring the myriad combinations of user involvement with that media. Video hot spotting is currently possible and will allow for the creation of multiple paths within an enclosed set of files.
Multiple server and multiple client reporting is a common request from the ISP market and larger corporate Intranet sites. The former must be able to both report on activities for clients across multiple, streamed media servers, as well as parse multiple clients on a single large server. For the corporate intranet, and in the case of a large financial institution headquartered in New York, the file requests for encoded media is at such a volume that they have an urgent need to bill bandwidth usage to the specific departments. Companies using streamed media across their Intranets will need flexibility within their reports to be able to do so. To make accounting manageable, simultaneous user analysis, multiple visits combined into total usage reports and aggregation based on subdomains will be required.
The use of streaming media today is the first step toward digital streaming in the near future. By examining the precursors to interactive engagement of an audience, radio and television, we are able to set a baseline of understanding the reporting, analysis and characterization needs of this emerging medium. We have already begun a course of examination of usage characteristics with the statistical tools available today. By leveraging the two, it is possible to extrapolate some of the requirements through the next generation of rich media on the web.
Andrew Fry is CEO of Free Range Media, Inc. (www.freerange.com) a web development and Internet integration company headquartered in Seattle, WA with offices in major cities across the United States. He is President of Lariat Software, Inc. (www.lariat.com) a streaming media application, software development company and wholly owned subsidiary of Free Range Media. Co-author of Warner Books publication How to Publish on the Internet Andrew writes articles and papers for a variety of national publications and institutions. He presented the paper "Publishing in the New Mass Medium," at the Second Annual World Wide Web Conference in 1994, the poster session "Temporal Design" at The Fourth Annual World Wide Web Conference, "Extending the Internet," at Seybold Seminars in 1996, and "The Future of Web Communities," at WebINNOVATION in 1997. Andrew Fry is a member of the Board of Directors of the Washington Software Alliance.