Thinking Books

I was a research assistant at two universities in the 1990s. So I spent a lot of time loading and crunching data on mainframes with SAS, then I’d take stacks of greenbar fanfold printouts to my bosses (professors and/or policy institute types with PhDs) and we’d pour over summary statistics, plots, regression output, etc. I enjoyed seeing the way their minds worked as they interpreted the results. They had a way of questioning what appeared to be strong results, framing stories about what is actually happening, and identifying the next steps in analysis to send me off to do next.

As I read current books on data analysis, data science, big data, whatever, there always seems to be a hand-wave when it comes to the steps of output interpretation, framing, questioning, discrimination, etc. Something like “This step is beyond the scope of this book, go figure it somewhere else, now let’s get back to R and Python code.” I know that the “somewhere else” comes in part from experience with subject matter, but to me there should be a way to provide some structured, if not formal, ways to approach building this experience.

booksthinkSo over the past year I accumulated a list of books towards this goal. If I read an article that mentioned a book relevant to thinking about interpreting data, I’d put that book in the list. Christmas came last month and brought some sweet Amazon gift certificates my way, so I bought the books on my list and dug into them. They’ve been great! By now, I’m pretty sure people around me are already tired of hearing me say things like “You know, according to Kahneman…” or “Nate Silver has a related story to that” or “Nassim Taleb would tell you that…” Anyway, the books are listed below from most favorite to least, but each of them were good enough that I’d buy and read them again.

The Signal and the Noise: Why So Many Predictions Fail — but Some Don’t – by Nate Silver. This was probably the first book that started me down the rabbit hole of this list since I regularly read Silver’s 538 blog.

Fooled by Randomness – by Nassim Nicholas Taleb. This is older than his popular Black Swan book but I liked it more. This book’s theme is that events we think have a cause may just be due to chance.

Thinking, Fast and Slow – by Daniel Kahneman. If you have a background in psychology or economics (or like me, both) then you’ve probably heard “Kahneman & Tversky” muttered by professors several times. This book deals with fast and slow thinking, aka type 1 and type 2 thinking, or thinking about dealing with a bear versus deciding whether to get a data science degree.

The Black Swan: The Impact of the Highly Improbable – another Nassim Nicholas Taleb book, this one on how to not be a turkey.

How We Know What Isn’t So: The Fallibility of Human Reason in Everyday Life – by Thomas Gilovich. Gilovich worked with Kahneman but there’s almost no overlap with Kahneman’s book above, so get them both.

Thinking in Time: The Uses of History for Decision-Makers – by Richard Neustadt & Ernest May. The authors intended this as a book for policy makers and government employees, but I think the material generalizes to any situation.

I’d enjoy hearing your recommendations on other similar books I could add to my reading list. One that I’ve been considering is Thinking with Data by Max Shron, so if you’ve read it please let me know what you thought about it.

Posted in Books, Data, Professional Development | Tagged | Comments Off on Thinking Books

GUI Phooey

Sometimes a complex undertaking can be avoided by asking the right questions. The best example of this for me was when a client asked me to meet with the head of operations to discuss a potential project; he wanted a GUI interface on one of his department’s batch processes.

So the head of operations takes me to his office to show me the problem. He fires off the batch process from a command line, then tells me “Now I wait 2 hours for it to finish. For those 2 hours the operations team has no idea what stage it’s at, how far it’s progressed, how much longer it might take, etc. Plus, people drive us crazy for those 2 hours stopping by and asking how far the batch job has progressed, why is it taking so long, when will it be done, etc. and we have nothing to show them.”

The request made a lot of sense to me, but I also knew how much work it would be to create such a GUI interface or dashboard, as well as the new complexity it would add to the system. So I asked the head of operations the following questions…

Me: Would you need this GUI solution if the batch completed in 2 minutes instead of 2 hours?

Ops: No, I wouldn’t need anything else!

Me: What if it completed in 10 minutes?

Ops: No, that would be fine.

Me: How about if it completed in 15 minutes?

Ops: Hmmm, I’d have to think about that, maybe, I’m not sure.

Now I knew the pain threshold. So I asked if I could have a week to try getting the batch completion time down to 10 minutes just by performance tuning the database operations. I got the go-ahead to try performance tuning since the client had been anticipating a month of work to build the GUI interface.

Of course, it took very little time to find a frightening query that was consuming most of the 2 hours. After a couple of days I got the batch completion time to under 10 minutes just by tuning some SQL.

I know it annoys people when I ask questions and push back on their ideas, but it can pay off!

Posted in General Technology | Comments Off on GUI Phooey

Ops Books

books_ops

These books are never far from my desk

My operations management experience was needed for a consulting project last year. I had been the head of a technology team before joining President Obama’s re-election campaign in 2011, plus I worked in management early in my career, so I had some books nearby to help get my head back into the operations and management realm.

While thumbing through these books looking for excerpts or chapters that might help, there were four of them that I decided to re-read from cover-to-cover. Even though some of them were published years ago, their material still seems relevant and useful. These four books plus a couple more are listed below.

Web Operations: Keeping the Data on Time by John Allspaw – This is one of those books of essays by various experts that O’Reilly seems to like churning out. For example, Baron Schwartz wrote the chapter on databases. In other words, this is a good book.

Scalable Internet Architectures by Theo Schlossnagle – Examples in this book involve very large systems, but this is a good book to have around even if you aren’t working with large environments. I’d also say that it’s a useful book for admins, developers and management.

Release It!: Design and Deploy Production-Ready Software by Michael T. Nygard – The first three sections on stability, capacity and design build up to the final section on operations. It’s easy to think this is a book on software development from its title and description, but it’s a valuable ops book as well.

The Visible Ops Handbook by Kevin Behr et al. – The subtitle is “Implementing ITIL in 4 Practical and Auditable Steps” where ITIL stands for Information Technology Infrastructure Library. Sounds like a real page-turner, right? Actually, it is… I’ve highlighted something on almost every page.

The Goal: A Process of Ongoing Improvement by Eliyahu Goldratt -This book is a vehicle for the author’s Theory Of Constraints (TOC) that is discussed in The Visible Ops Handbook, so I’ll list it here as well. The Goal was already an old book when I read it for the first time in 1995. It’s written in the form of a novel about the manager of a plant where everything is always going wrong and the lessons he learns from a scientist friend who is trying to help him see the non-intuitive ways to solve his problems.

Learning from First Responders: When Your Systems Have to Work by Dylan Richard – Dylan was one of my amazing managers at President Obama’s re-election campaign. Also, this book is free. Need I say more?

Posted in Books | Tagged , , | Comments Off on Ops Books

Git ‘store’ credential helper with encrypted partition

Tonight I was setting up git on a new linux box so that it can access GitHub. I enabled two-factor authentication on my GitHub account almost a year ago; some great instructions for doing this are available here. I had been using “credential.helper cache’ for storing my credentials on linux machines, but this is a temporary store that by default caches your credentials for 15 minutes. I could increase that default, but it’s still going to be temporary. On my MacBook, I use the OSX Keychain to store these credentials permanently, which has unfortunately made me lazy. I wanted a way to store these credentials safely on my linux box so that I didn’t have to type them in repeatedly.

This led me to the git-credential-store helper. This stores credentials on disk, however they are not encrypted. So I began looking for an alternative. I wondered if I could use the gnome keyring with git. A search turned up that this might be possible but it wouldn’t be easy. Then it occurred to me that I have an encrypted partition on this machine utilizing dm-crypt plus LUKS and mounted under my home directory. If the git-credential-store helper stored credentials in this encrypted partition, that would provide some protection. There are still vulnerabilities, but I carried on.

Beware: I am NOT a security professional so what I am doing here might be horrible advice. It is quite possible that I have no idea what I am doing.

The git-credential-store helper has a “–file=” option that can be used to specify the file where credentials are stored. I set this to a file in my encrypted partition. By default this is “~/.git-credentials” so I used that same file name and replaced the “~” with an encrypted directory path (in the example here that is “/home/myname/encrypted/”). Let’s say that git is configured with the commands below.

$ git config --global user.name "MyName"
$ git config --global user.email "me@somewhere"
$ git config --global credential.helper 'store --file=/home/myname/encrypted/.git-credentials'

As a result, the “~/.gitconfig” file should look something like the below snippet.

[user]
	name = MyName
	email = me@somewhere
[credential]
	helper = store --file=/home/myname/encrypted/.git-credentials

If the “/home/myname/encrypted/.git-credentials” file doesn’t exist, it will be created the next time that git requests credentials (when using two-factor authentication then remember that the Personal Access Token is used for the password and not the regular GitHub password). After that, credentials should not have to be entered again (of course, this assumes that the encrypted partition is available and at the same mount point).

Posted in General Technology | Tagged | Comments Off on Git ‘store’ credential helper with encrypted partition

SQLFriends Lunch How-To Guide

As lunch “how-to” guides go, this one is a bit different. Nothing here regarding which fork to use for each course, what to do with your napkin when you get up from the table, or how to create a distraction so that you can check if there’s spinach in your teeth by using your spoon as a mirror. No, none of that. Instead I’m going to share three simple things I wish I had done while attending last month’s SQLFriends Lunch.

But first, a bit of background. SQLFriends is a community event organized by Aaron Lowe (blog|twitter) with an emphasis on discussion and networking. The inaugural event was a lunch in downtown Chicago last month. You can read Aaron’s own review of how it went here and Bob Pusateri (blog|twitter) reported on his experience at the event here. It was a sold-out event, and from observing the buzz in the room it seems to have been a fantastic success.

I had a great time getting to meet some folks who aren’t able to make it to the evening downtown SQL Server User Group meetings, discussing SQL Server issues, asking questions, etc. Before I knew it the event was over and we were all heading our separate ways. I really like how Aaron measured the vibe in the room and saw that people were engaged, so he just let things happen rather than trying to follow a set agenda.

Still, I wondered what I could have done to personally get more out of the event. As I reflected on this, three things occurred to me.

  1. Bring a list of questions – This event’s registration form had a box for questions you’d like answered. However, I should have gone beyond that, printed out those questions and brought them to the event. Better yet, bring several copies of those questions so that I could have handed them out to folks at my table or even other tables. While we didn’t seem to run out of discussion topics, this would be a way to get things moving if there had been a stall, and it would be a great icebreaker for meeting new people.
  2. Business cards – Yes, this is obvious. However, it had been a busy morning and I dashed out for the event without stopping to take a quick inventory. Thus, I arrived at the event with just a few business cards in my pocket. For an event like this I should have been carrying more like 50 of them and made sure each attendee received at least one. This would have ensured that I met each person at the event. Which leads to the third item.
  3. Who’s got your back? – I was so focused on my own table that I didn’t know who was sitting directly behind me until the event was ending and people started getting up to leave. Even if it required obnoxiously pushing my chair back and standing up, at some point, perhaps between courses, I should have looked to see if there were familiar faces or new friends to make sitting behind me.

Now, in case you haven’t heard, registration is open for the next SQLFriends Lunch! Information is available here and this time it’s in Lombard instead of downtown. You can follow this event on twitter with the #sqlfriends hashtag. Also note that the upcoming SQLFriends Lunch is on Friday May 18, 2012, which is the day before Chicago’s SQLSaturday #119.

Do you have additional ideas on making the most of this event? If so then please feel free to add them to the comments below!

Posted in Professional Development | Tagged , | Comments Off on SQLFriends Lunch How-To Guide