by Deni Connor, Network World

Fixed content storage grabs users’ attention

feature
May 29, 20037 mins

Users need to start evaluating their options regarding storage of fixed content data now that analysts have predicted it will consume more than half of a corporation’s storage resources by 2005.

Fixed content storage consists of data such as digital images, e-mail messages, presentations, video content, medical images and check images that don’t change over time. Unlike transaction-based data, whose usefulness is short, fixed content data must be kept for long periods of time, often to comply with retention periods and provisions that government regulations such as the Sarbanes-Oxley Act of 2002 have specified.

Analyst firms such as the Yankee Group say that the market for fixed content data will grow from 308,000 terabytes this year to 1,251,900 terabytes in 2006. Enterprise Storage Group Inc. says that fixed content reference information will represent 54% of all data by 2005 and will grow faster than that of traditional transaction-based and file-oriented storage.

While fixed content storage consists of a variety of data that must be referenced and addressed, it’s a huge market nonetheless, which requires some unique capabilities to store it and differentiate it from short-lived transaction-based data.

A number of issues are driving the growth of fixed content, analysts say.

“The biggest one is compliance,” says Jamie Gruener, a senior analyst for The Yankee Group. “Compliance is multidimensional. Not only do you need to save the information in an indexed way, you also need to be able to access the information at a fairly rapid rate. And in some cases, it has to be preserved in an unaltered, unrewritable state.”

Other issues include the type of media used for storing fixed content data and its cost.

WORM, RAID and other issues

Unlike transaction data, fixed content data can be stored on equipment that has subsecond access times. Because of this, it has traditionally been stored on write once read many times (WORM) tape, disk or optical media instead of more expensive spinning disks such as RAID arrays from EMC, Hitachi, Hewlett Packard Co., IBM or Sun Microsystems, which transaction data requires.

“Customers need a different type of storage system, one that is able to handle more concurrent users and allow access to a point that makes sense,” Gruener says. “Tape is going to be great if you are archiving and don’t need access to data, but there are compliance regulations now that require you to be able to get to data within a 24-hour period. With tape that is not always possible.”

Users are starting to use Advanced Technology Attachment (ATA) drives to store fixed content data. Commonly used in desktop computers, ATA drives are inexpensive and capable of writing data twice as fast and retrieving data five to 10 times as fast as tape, Enterprise Storage Group says.

“I hear from customers about aligning the storage system performance and file access performance with the number of times it is accessed,” Gruener says. “If you are archiving data, there’s the assumption you don’t need to access that data every day because it’s archived. But in some of these content arenas, you will need to access the data on a regular basis, and it needs to be served up to multiple customers.”

Among the companies deploying fixed content storage systems is St. Vincent Hospital and Health Services in Indianapolis. Rich Banta, senior enterprise systems engineer at St. Vincent, was confronted with a growing amount of data that the organization’s McKesson ALI UltraPACS (Picture Archiving and Communications System) created.

Banta chose a deep archiving system last year, the StorageTek BladeStore, which is managed by StorageTek’s Application Storage Manager and saves data, when it is no longer needed, to libraries that StorageTek Automated Cartridge System Library Software manages.

“Right now, the BladeStore is configured for four terabytes, but we are going to scale it up to 12,” Banta says. “We will be able to keep about 10 to 12 months of our PACS radiology data accessible within milliseconds.”

He says that after a year, the recall rates of PACS data falls off precipitously.

Banta considered archiving such information on his StorageTek L700 tape libraries immediately after it was scanned into the PACS, but rejected tape because of its retrieval time.

“If you pull it off of tape, whether it’s remote [across the network] or from a local drive, it’s going to take 68 seconds,” Banta says.

He chose StorageTek’s BladeStore instead, which uses ATA drives.

“The ATA drives are inexpensive,” Banta says. “They cost about a penny a megabyte. The architecture of this system, including making two back-up copies to tape, came out to four cents a megabyte.”

By contrast, storing fixed content to SCSI drives costs 3 to 5 cents per megabyte and Fibre Channel 7 to 15 cents per megabyte, according to Giga Information Group.

But Banta still is looking for a system for long-term storage of automated medical records.

“The BladeStore is not considered to be a hard enough WORM media for the authoritative record, so that still goes to our optical FileNet [Enterprise Content Management system],” Banta says. “We are exploring devices to make this true WORM through very strict tracking mechanisms, but it’s not passing our muster now. The BladeStore is simply a long-term deep archive.”

Object-oriented storage

Enter Centera, an object-oriented storage system that EMC introduced last year.

Traditionally, storage is viewed as either blocks or files of data that are subject to being retrieved from a specific location and media type. Block-oriented data resides on Fibre Channel storage-area networks and direct-attached storage; file-oriented data on network-attached storage.

In object-oriented storage, each piece of data is represented as an object and automatically is assigned a unique digital identifier or fingerprint. The fingerprint is used to retrieve the object, irrespective of its location and placement, whether on tape, spinning disk or ATA media. As data moves from disk to tape during its life cycle, its fingerprint, sometimes called metadata, tracks its location, so that it can be retrieved quickly and so that related data objects, such as X-rays and test results for a patient, can be correlated coherently.

The same digital fingerprint identifies not only the location of the data but its character. For instance, an X-ray that is stored on optical media could be associated with a keyword in a document management system and from there to the patient’s chart and prescription information.

Robert Terdeman, senior vice president and CTO for Rogers Medical Intelligence Solutions in New York City, chose Centera to store the volumes of clinical information Rogers sells to pharmaceutical, biotechnology firms and healthcare professionals. He combines it with Documentum’s enterprise content management platform, which organizes the data before handing it off to Centera for storage.

“A lot of our collateral comes in on paper,” Terdeman says. “We take it and extract key words and store the collateral on the EMC Centera. We are constantly asked to go back and look for this piece of information or that document, and we can never throw it away.

“Data could be relevant 10 or 12 years down the road,” he says. “For instance, with our Retrospective Data Analysis, you can search back 12 years on urinary tract infections. We have the largest repository of unpublished medical information in the world.”

Before using Centera, Rogers had cabinets of paper records that took hours to dig through to extract information. They weren’t easily accessible to people who needed information.

A year ago, Terdeman started to redesign the information network for the company, which employed 100 people and was unprofitable.

With Centera, Terdeman went from negative profitability to “roughly $2 million profitability. We reduced the head count from 100 to 71 solely based on the implementation of scanning and the Centera technology. The five terabytes of the Centera was less than the fully loaded support cost of one technician,” he says. “By that itself, it was justified.”