UnSQL Friday – Living in the Intertubez

Jen McCown (Blog|Twitter) has declared today UnSQL Friday #006. I’m going to try to sneak this one in before Friday ends, and the clock is ticking. No problem, I think it will easy for me to mention what I love about “Living in the Intertubez.”

One of my early benefits when I started participating in twitter was discovering great things happening within blocks of my home and work. For example, there were technology meetings with really awesome folks in my neighborhood. I had been missing out because, quite frankly, sometimes even technology associations have websites so uninspiring that you can’t imagine attending their meetings voluntarily. However, with twitter I could follow members of these organizations and learn that they were amazing… I couldn’t wait to meet them! The internet means empowerment to the people and degrees of independence from the formal communications of stodgy organizations and the masters pulling their strings.

But the reach of twitter went beyond my neighborhood. It meant connecting with people all around the world. Those first few times you meet people in the “real world” who you had only known on twitter is an amazing experience. You just start talking like you’ve known each other for years and pretty much pick up from your last tweet with that person. Who could have imagined such a thing was possible when given 140 characters to work with!

“Living in the Intertubez” played a huge role in my landing in a job more amazing than anything I could have imagined. Actively blogging and tweeting not only kept me in touch with former colleagues, but also provided a sense of what I had been doing to advance my skills and such.

This week I stepped up my “Living in the Intertubez” world by joining facebook. I’d resisted so far because my experience with another social network many years ago soured me on such things. But things change fast in “the Intertubez” and I’m looking forward to learning the lingo of the facebook world.

Posted in General Technology | Tagged , , | Comments Off

T-SQL Tuesday #022 – Data Presentation

September’s T-SQL Tuesday is brought to us by Robert Pearl (Blog|Twitter), and he has chosen Data Presentation as the topic for this month’s T-SQL Tuesday. I shuddered after seeing this topic because it brought to mind an experience where the separation of data and presentation was violated.

I received an email from the boss saying he had been querying a particular database. He wanted the leading zeros removed from all numeric values.

I said that’s no problem, we’ll just modify the query tool to trim leading zeros. The previous boss wanted to see data exactly as it was stored in the database, thus he didn’t want leading zeros trimmed. But no problem Mr. Boss Du Jour, we can change that. Or better yet, we can offer it as an option in a checkbox. Because who knows what the next boss will want (okay, I didn’t say this last part, I just thought it to myself).

The boss countered “No, don’t modify the query tool. I also want to see the data as it is stored in the database. And the way I want it stored in the database is with the leading zeros trimmed.”

I explained that there were specifications written years before detailing that numbers were to be stored in that database exactly as they were received from data providers. There were a number of reasons for this, the most crucial being that part of our service offering was the ability to rate data from our various sources by a range of metrics. This particular database was used for that product offering. Modifying the numeric values by removing leading zeros would corrupt that process.

In the end, none of my arguments mattered. I could tell it had become a point of pride for the boss to win this no matter what. I asked why this was so important. For example, did he have storage or performance concerns. But his answer was just “It looks better without leading zeros.”

After that, there were questions I thought about asking but didn’t. Like if he thought it would look nicer if he could see the database at the byte level. Or if he had an opinion on whether big-endian or little-endian looks prettier. Or if he’d be happier with a mauve database.

Or, how we were supposed to put the zeros back when the next boss asks for them.

Anyway, I know I mentioned this in a prior post, but it’s probably worth repeating… when asked to do certain tasks (such as those regarding presentation), at least consider whether it’s something that belongs in the database or at another level.

Thanks to Robert Pearl for hosting T-SQL Tuesday #22, especially when he was asked to host earlier than planned! And thanks to Adam Machanic (Blog|Twitter) for creating this monthly blog event and keeping it going!

Posted in T-SQL Tuesday | Tagged , | 4 Comments

Labor for the Data

Happy Labor Day! My last post on Google Correlate is still on my mind. So is Buck Woody’s (Blog|Twitter) post on being a Data Professional … yes, this is at least the third time I’ve mentioned Buck’s post, but I think the message is that important to grasp. Anyway, what’s on my mind is two encounters I had that, at least for me, indicated I should not get so involved in technology that I forget about the data.

The first encounter was seventeen years ago, the second was ten years ago. So I have two stories, and since it’s a holiday you probably feel like reading just one. Okay, I’ll go with the more recent story for now and save the older one for a follow-up blog post.

It was my first day at a new job. The previous day I graduated with a computer science degree (I know, I should have taken some time off in between, but I was excited about the new job). My boss came over to my desk to welcome me, and the exchange went something like this:

  • Boss: Hi Noel, we’re glad you’re finally here.
  • Me: Same here, I received my degree yesterday, so I’m ready to get to work!
  • Boss: You were the one who wanted to wait until you graduated to start work. I was ready for you to start working here months ago. Quite frankly, I don’t care about your computer science degree, it’s the skills from your previous graduate study that are interesting.

Ouch, talk about a backhanded compliment. I’d just spent over two years of time and boatloads of SQLCruises in dollars to study algorithms, programming languages, relational algebra, software engineering techniques, etc. How could he not value that? How could he find the five years of grad school I did beforehand more interesting? Especially when half of those five years didn’t even result in a degree.

After a few years, I not only understood his viewpoint but even agreed with it. Well, mostly agree with it. I’d never want to give up the computer science study (especially the algorithms material), and if I try to think of what parts I could have cut out, I don’t come up with much.

So what was it about the five years of non-computer science grad school that made my boss interested in me?

The answer: data. I spent all that time rolling around in data.

The first graduate degree was in social and applied economics. The focus was on applying data, statistics and economic theory to public policy and business issues. I was also a research assistant. So days and nights revolved around loading data tapes on mainframes, crunching away at them with SAS, then taking the resulting greenbar printouts to a professor’s office. That’s where your real education started: pour over scatter plots and regression lines, dig into data rows to find points that didn’t fit, figure out what was missed, what the data told us that we didn’t know, adjust models, then head back to the computer center and repeat.

After that, I taught economics for a few years, then I headed back to graduate school to work on a doctorate (I became bored with being in a small college town, plus my interests changed, so I left without a degree during my third year). Once again I was a research assistant while taking courses in quantitative methods, economics and accounting, so my experience was similar to the above. Similar but not the same; by that time, PCs had become powerful enough that you no longer needed a mainframe to run SAS on large data sets. Which meant you spent more time with data… it was right there on your desktop, so you didn’t have to run back and forth to the computer center!

That’s the end of the story. My take-away: it occurs to me that in recent years I’ve spent disproportionately more time learning about the technology side than the data side. So I’m consciously going to try to balance that out in the coming months. With that, I guess it’s time to find my copy of Hogg and Craig. Oh there it is, underneath my monitor stand :-)

Posted in Professional Development | Tagged | Comments Off

Drawing with Google Correlate

This week, Nick Hatch (twitter) showed me the Search by Drawing feature in Google Correlate. My reaction was “That’s going to consume several hours of my weekend” and sure enough, it already has :-)

Getting started is easy. Go to Google Correlate (you’ll need to sign in with your Google account) then click on the Search by Drawing link on the left side of the page under the Correlate Labs section. With that, you’ll be presented with a blank chart where you can draw a time series of search activity… just draw a line, click the Correlate! button and see what happens. The tool will display your line with a line of activity for search terms. The most correlated search term is displayed initially, but other search terms are presented as well (ranked by decreasing correlation) and you can click on those terms to display their lines.

You might be wondering how could I spend hours drawing lines. Fair enough.

I started thinking about search terms and imagining what a time series for that term would look like, then drew it, then looked at what my line actually matched. For example, try to think of the number of searches over the last several years for Lady Gaga or Charlie the Unicorn, then draw that time series and see the results.

Some of my more interesting attempts:

  • The line I drew that I thought might look like the time series for MusicMatch had a 0.9756 correlation with AltaVista. This was interesting because Yahoo ended up acquiring both of these products.
  • The line I drew trying to guess search activity for FarmVille correlated most highly with Dropbox. This wasn’t too interesting, but the data had something unexpected. As I looked at the time series of search activity for Dropbox, there wasn’t a surprise in the general trend. Dropbox was founded in 2007, so sure enough in 2007 the time series shot up exponentially. Before that, the search activity was perfectly level, except for little squiggles around 2005. Hmmm…
  • I drew my imagined time series for searches on the mortgage banking industry, and my line’s highest correlation was with a bank that was purchased after the 2008 banking crisis. However, well before the banking crisis, there was a sudden and dramatic peak in search activity for that bank. With a little searching, I read that the bank encountered regulatory activity during the period of that peak. So looking at the data prompted me to do a little digging and learn something.

Anyway, if you’re a data geek and haven’t poked around with Google Correlate yet, then you might want to check it out. Enjoy!

Posted in Data | Tagged | 1 Comment

Netbook – Part 3

Thirteen months ago I wrote Part 1 about my experience with choosing a netbook to take on the very first SQLCruise. Part 2 continued with setting up the netbook for SQL Server development and education use.

Thoughts After a Year of Use

So after more than a year of use, what’s the verdict? Easy answer: success, no regrets.

The netbook not only went on the very first SQLCruise from Miami to Cozumel, but also the most recent SQLCruise to Alaska. On both trips I used it in the classroom as well as to VPN into the office to do work. Same for last year’s PASS Summit and some SQL Saturday events. For a year I carried it to the office so that I could work away from my desk for an hour or so each day. The netbook had a cost of around $300, throws off very little heat compared to a laptop, and only weighs 2.8 pounds. Most surprising: the estimate of 14 hours of use on battery power was not much of an overstatement (turning on wi-fi zaps it down a bit, turning on bluetooth zaps it down a lot).

So is that the end of this post? Am I going to end with “All is well, tune in next year when I will let you know if the netbook is still running” or something like that?

No.

Let us push-on and take the netbook to a new place. A wonderful place with everyone’s favorite creature of the arctic waters, the narwhal. In this particular case, a natty narwhal.

It’s Ubuntu Time

Ubuntu is a linux distribution. Unlike the olden days, you can avoid the disk partitioning stuff, boot loader configurations, command lines, having to do yet another operating system install where you have to babysit the machine in case you need to type in information and hit the enter key, etc. More on that later.

But first, why would I want linux on my netbook if it already works fine? Because sometimes I just want to grab my netbook and do some web browsing, but I don’t want to wait for Windows 7 to boot up. So I wondered if a minimal Ubuntu install would boot up faster on my netbook. No need for suspense… the boot up time is about the same, but the feature I like is that the Ubuntu user interface is very nice for smaller, netbook-size screens. This interface seems a bit awkward at first, but it doesn’t take long before you begin to appreciate it.

One way to get Ubuntu running would have been to use a virtualization solution such as VMware Player. But that’s not going to work in this case because then I’d have to boot to Windows first, then Ubuntu. That’s not much of a time-saver. Also, my netbook doesn’t have a lot of memory or CPU horsepower for pushing virtual machines. So it was looking like I would be setting up a dual-boot configuration.

At this point, I decided to try the option of running Ubuntu within Windows. This turned out to be very simple, and so far I’ve been quite satisfied with the result. With this installation option, Ubuntu installs into a folder in your Windows file system, and the experience is similar to installing a Windows application. So you don’t have to deal with partitioning your hard disk for a separate operating system. The Ubuntu installer uses your existing Windows installation to figure out the settings to use, so your interaction during the installation process is minimized.

If you want to try out Ubuntu then you can follow the instructions here, plus this page has more detailed instructions on installation as well as how to uninstall (you uninstall from the Control Panel just like you’d uninstall a typical Windows application).

Once you have Ubuntu installed, fire up the web browser and do some reading. A couple of links I’d recommend would be

So my netbook has become even more useful now. When I start it up, I can choose to boot into Windows to do some SQL Server work, or boot into Ubuntu for some linux goodness.

Posted in Hardware | Tagged , , | Comments Off