Sun and Oracle Community Voices How to Buy Log In United States [Change] English

»  Contrarian Minds Archive

High Semantic Storage

Retention Dates? Purge Dates? What Are You Talking About, Boss?

By Al Riske

04.Sept.07- Brian Wong wants to liberate data storage.

For 40 years, storage has been a slave to the computer, doing only the most menial tasks and not allowed to even know why.

"Wong, a Distinguished Engineer in Sun's software group, sees that as a huge waste.

He describes the current paradigm this way: "When I say write block 26, you take this bag of 500 bits and you stick it right there in block 26, and when I ask for it, you give it back to me."

Now, that clearly works, but ...

"What it doesn't do," Wong says, "is it doesn't give the storage much opportunity to help."

It can't help you, for example, if you want to make sure that your financial data is retained for 10 years from the last time it's modified.

"Ask a disk drive to do that and, if it could talk," Wong says, "it would say, 'Uh, boss, what's data? What is a date? What does it mean to delete?'"

So, he points out, what worked for 40 years is too limited to take us where we need to go.

Not with businesses creating huge stores of data, buying tape cartridges one million at a time, and leasing salt mines to store them in.

Not with governments around the world legislating detailed requirements for financial data and medical records.

Not with increasing concern among consumers about privacy and the security of their personal information.

Which is why Wong is working on something he calls "high semantic storage."

"Today, computers share storage -- the bits -- but they don't share data. That's because the semantics of which bits are which is interpreted by a file system that lives in the computer."

Brian Wong
Distinguished Engineer
Sun Microsystems

 

While the storage problem can be described rather simply, attempts to solve it have (so far) been, well ... less simple.

"We used to have this notion that, well, you can go and modify the file system so that it just does this stuff. And you can do that. You can modify your file system so it understands the notion of retention dates and all sorts of things. But now you have the problem that you can't share that data with other computers," Wong says.

"We could change all our Solaris file systems, for example, but, first of all, that's a lot of work because there are actually a lot of Solaris file systems. Then someone would say, 'Actually, I really wanted you to do that in a cluster.' Clusters are important, so now we've got to go change not only the file system, but we have to make all those changes cluster-aware and that's kind of a pain in the neck. Again, it's something we could do because they're ours. But then someone would say, 'Gee, I want to access that data from Windows.' Now, for all sorts of technical and business reasons, we don't get to change things in Windows ... or OS X ... or AIX ... or whatever."

Clearly no one is going to solve the problem that way.


"Today, computers share storage -- the bits -- but they don't share data. That's because the semantics of which bits are which is interpreted by a file system that lives in the computer. But if you do something that's actually very simple to describe, which is, move the file system into the storage, then guess what? The data can be shared," Wong says.

"Yes, the semantics have to go over the wire, and as long as you find a way for everyone to have the same semantics -- in other words, speak the same language -- then everybody has an equal shot at using (and sharing) the data.".

In other words, all you need is a standard, shared protocol.

A protocol like NFS, for example.

"What we're after here is to find ways in which we can take an existing protocol like NFS and solve the set of storage problems that people have been continuously presenting to us for the past, I don't know, five or six years."

Brian Wong
Distinguished Engineer
Sun Microsystems

 

Wong notes that in some quarters NFS, the network file sharing system Sun invented in1985, carries the stigma of being old, obsolete, and slow.

Not so. Not by a long shot.

Though it may be old in the sense that it debuted more than 20 years ago, the NFS standard was revised in 1995 and again in 2004. In fact, it's being revised right now.

"It certainly isn't obsolete because obsolete means you can't do what you want to do with it. But there are all sorts of things you can do with it that you've never done before," Wong says. (More on that later.)

Is it slow?

"I hear this from users all the time. 'Gosh, I can't do that with NFS, it's way too slow.' Telling me that is picking on the wrong guy, because I always ask them, 'How fast do you need to go?' These days, NFS out of the box will go 235 megabytes per second with standard networking technology," Wong says.

With more advanced networking like InfiniBand and RDMA (remote direct memory access)?

"It only goes 980 megabytes second. Sorry."

Only 980?

"We actually don't know how fast we can go because the boxes we have in the lab ran out of hardware. That's like saying, 'I took my car down to the shop to have it dyno'ed, and the dynamometer topped out at 1300 horsepower, so I don't really know -- could be 1302 or could be 1700.'"

"People kept coming to us and saying, 'I have this law to comply with, so I have to be sure these different things happen.'"

Brian Wong
Distinguished Engineer
Sun Microsystems

 

The big change coming to NFS is parallelism.

"Somebody said, 'What if we took a file and spread it across ten filers and told each of them to give us a piece of it -- and transfer it in parallel please? That would go much faster, wouldn't it?'" Wong explains.

He is quick to note that Sun can't take credit for the idea, though our engineers are playing key roles in developing the new pNFS standard. In fact, Sun's implementation is coming along nicely.

"You'd think, 'Well, going from inherently nonparallel to parallel -- that's a pretty big change.' And it is. But the data transfer protocols remain the same. So if you have data that's spread onto 10 filers, you send the same bits over the wire to request the data as you would have with one, except you're sending 10 requests to 10 places instead of one request to one place," Wong says.

"Interestingly, the changes necessary to make parallelism happen are what enable us to do other kinds of things as well. We're talking about having data do things the way they should be done, without my having to touch it myself. It's a simple matter of programming."

He laughs.

"A simple matter of writing an awful lot of very complicated programs to do these things."

"We're talking about having data do things the way they should be done, without my having to touch it myself. It's a simple matter of programming."

Brian Wong
Distinguished Engineer
Sun Microsystems

 


Wong became interested in storage soon after joining Sun in 1987, just five years after the company was founded.

Though he had been coding for years, he started selling workstations for Sun. Later, he moved into technical marketing, where he discovered that "for some interesting class of applications, the thing that governs performance isn't the server, it's the storage."

Wong also happens to be the guy who brought capacity planning into the UNIX market, modifying mainframe methodologies to get comparable results in an open, networked environment.

His boss eventually decided that Wong really belonged in engineering, where he has continued to tackle complex customer challenges.

"People kept coming to us and saying, 'I have this law to comply with, so I have to be sure these different things happen.' Or, 'My customers don't allow me to shut the system down so I can do a backup, so I have to figure out how to do a backup in like one second,'" he says.

And it has become clear that semantics lie at the heart of many of these problems.

"When you have semantics, you not only have a bunch of bits but you have an understanding of which bits those are -- and you can use those semantics to cause the right things to happen," Wong says.

"For example, I now know that this thing is a file. Moreover I know it's owned by you and you allow this set of people to access it. If someone else asks for access I'll say no, or I may even pretend it doesn't exist."

And that's just the beginning of high semantic storage.

"Now we can extend that even further -- and we intend to -- to do other things with those semantics. For example, retention dates. Purge dates," he says. "We can say, 'You must retain this until such and such a date and make it go away permanently after that.' We could also set it up so certain types of data are always replicated at least 500 miles away."


Brian Wong

Title: Distinguished Engineer, Software, Sun Microsystems

Expertise: Capacity planning and network-attached storage.

Claim to Fame: Brought capacity planning into the UNIX market, modifying mainframe methodologies to work in an open, networked environment.

Quote: "What we're after here is to find ways in which we can take an existing protocol like NFS and solve the set of storage problems that people have been continuously presenting to us for the past, I don't know, five or six years."

Education: Studied at the University of Virgina and Virginia Tech, but never finished.

Background: Started writing code for a small consulting firm that worked with AT&T's UNIX group and followed that with a stint at British Telecom before joining Sun 20 years ago as a salesman. He soon moved into technical marketing and finally into engineering.

Patents: 4 granted, 32 pending.

Hobbies: "Bowling. Photography. Driving race cars."

Last Book Read: The Majipoor Chronicles, by Robert Silverberg

Favorite Food: Bananas Foster.

Pet Peeve: "The pursuit of perfection as opposed to good enough. We have this in engineering all the time. You see some code and say, this is a) brilliant, b) incredibly over-engineered, c) a year late, and d) consumes twice as much CPU power as the minimum set would."

Little-Known Fact: Has bowled three perfect games.

Childhood Ambition: To be rich and idle.