Recently I had some back-and-forth on twitter with Jeff Smith (Twitter|Blog) about spreadsheets and databases, and he blogged about Excel and Oracle last week. After that plus reading an article this morning related to the use of spreadsheets in my area of work, I decided to jump in with my own post on the subject.
I have a love-hate relationship with spreadsheets. Much of my background is in accounting, economics and finance so that probably explains the “love” part. The “hate” part started early, probably with the following story.
I was working for a Fortune 100 technology company and accounting systems were all over the place, with individual departments choosing their own vendors (or cobbling together their own solution) on a wide variety of platforms. Some of them could be glued together to get a division-wide view, but not everything. Thus, a Multiplan (we’re talking 1980s here) spreadsheet was used to gather everything needed to drive reports. Whatever could be pulled from individual systems would be dumped into that spreadsheet’s cells. Data that couldn’t be extracted from a system was to be hand-entered into the spreadsheet. I had that job for a while, and the worst part wasn’t the 12″ monochrome green screen – it was that the manual input cells were located all over the place in this spreadsheet. Take a value from a microfiche and enter it in cell R213C12. Wade through a box of green bar printer paper over in the corner to get the value for cell R9C3. For the value in cell R162C9 go to the second floor and talk to Joe, you know, the guy in the cube that’s next to the woman who looks like a brunette version of Molly Ringwald. You get the picture. So after a month of this, I asked my manager if I could add an organized, clearly laid-out and labeled data entry section to the spreadsheet, and have all these stray input cells reference cells in the data entry section. Her eyes grew big and she shouted out a definitive “No, absolutely not!” As I began explaining the advantages of my idea, her face turned to a look of horror and she continued “No, absolutely not, you have no idea what all depends on this spreadsheet, no idea.” As she turned and walked away from me mumbling “no idea, no idea” over and over all the way down the hallway, I realized that I had come face-to-face with one of the evils of the spreadsheet.
Okay, so that’s a story from the bad old days, nearly a quarter of a century ago. So what? We’re getting there, just a moment.
Much of my work these days involves the investments industry, so I’ll deviate into a bit of finance for a moment. I just read an article on The Five Cs of Counterparty Risk which turns out to be Counterparty, Contract, Collateral, Concentrations & Credit analytics. Each of these items can be challenging to describe, measure, assess, etc. on their own. Evaluating all five together is more difficult, but then consider that you might have other contracts with their own Five Cs, and things might not be independent among these contracts. It gets complicated in a hurry.
What does this have to do with database technology? The article mentioned a survey which found “that 70 percent of companies use spreadsheets or internally assembled systems to manage counterparty credit risk.” It would be nice to know the breakout of that 70 percent, but I wouldn’t be surprised if spreadsheets made up the largest part.
I don’t see the spreadsheets going away. People are excited about Microsoft’s new PowerPivot technology and the idea of tying SQL Server together with Excel. If you generate reports from a database system, I’d be surprised if at least some of them aren’t requested in Excel format. Think you can refuse to produce Excel reports? Then users will go to elaborate means to get data into Excel, perhaps risking the integrity of that data. The reasons users do this are varied. They know Excel and don’t want to learn the interface you’ve provided. The interface you’ve provided doesn’t do what they want. They want to add columns, calculations or charts and it takes the development team too long to add them to reports. And so on.
We can do the data modeling right, properly design the database, get all the application components correct. The output goes into spreadsheets, where columns get added, new calculations are done, data elements are converted to other formats, macros are run, etc. For all we know, this is the actual form that is used for decision making, not the output of our well-designed system. If things go awry, if something looks wrong, someone will say “But it came from the database, it has to be right.”
So what are we to do as data professionals? I appreciate the information wants to be free concept, but just repeating that slogan as we shrug our shoulders is too easy. Without being overly restrictive, can we know what’s happening with data downstream? If we know that, are we prepared to react in a productive manner?