Two of Elijah’s favorite TV shows are Thomas the Train and Bob the Builder. As we were watching Bob the Builder one night, I had an idea for a video mashup…
Beer Stats
Dad, Luke and I all use the Untappd app pretty regularly. Think of it like Facebook for beer. You “checkin” when you have a beer and you give it a rating. You can see what your friends are drinking, get recommendations, etc. I love it because it’s an easy way to keep track of all the different beers that I’ve tried and how much I like each one. It’s a great encouragement to keep trying new things.
I recently tied into their API and wrote a quick app that lets me download the data locally for analysis. There are lots of questions that I want to answer, but here are a couple fun charts to get started.
The first is a chart showing how many checkins we’ve each had by IBUs (International Bitterness Unit). It doesn’t show how much we LIKE them, but it does show which ones we generally try.I’d love to expand this to show how this has changed over time.
Here is one showing the cumulative number of checkins we’ve had over time. This chart is a little sloppy because each user series should have it’s own color, but there are only three of us so it’s not too hard to follow. If you look at the end of the chart, Luke is the top line, I’m the middle and Dad is the bottom. You might think that after a while, you run out of new beers, but if that’s true, we haven’t hit that point yet. We’re all on a pretty steady upward trajectory.
I took a look at which beers have the highest combined score from the three of us. The beers with wider distribution are more likely to win because to get to the top it helps if we’ve all tried it. Since I’m out on the west cost and they are in the midwest, most of the time our beers don’t overlap unless they are bigger breweries. Here are the top six in order.
Deschutes Brewery Red Chair NWPA
Bell’s Brewery Two Hearted Ale
New Belgium Fat Tire
Great Lakes Elliot Ness
Great Lakes Burning River Pale Ale
Deschutes Brewery Mirror Pond Pale Ale
Looking at the bottom of that list is a little silly because there are only 40 beers that all three of us have tried. I expanded it out to beers that at least two of us have had. There are 129 in that list and here are the bottom three. I bet you won’t be surprised.
Coors Light
Miller High Life
Bud Light
I’m looking forward to playing around a lot more with this data. If you want access too, just let me know.
Coding At Home
When people ask me how to get a job where I work, I tell them to be curious and experiment at home. I’ve learned so many valuable things from tinkering with code at home and it has made a big, positive impact in my career.
In the past it has been a lot of website stuff, but since my move to WordPress, that has pretty much disappeared. Lately it has been stuff that a computer science geek would call “ETL” (Extract Transform Load) programs. I enjoy taking my data out of one system and storing it in a common place that I control. For example, my Ecobee thermostat logs a ton of data every day but it lives in the EcoBee server. I use their programming interface to download the data and store it in my own database. I do the same thing with my irrigation controller, fantasy football and a few other gadgets and services. There’s not a lot you can do with the initial data right at the beginning, but as time goes on, you build up a pretty large data set and you can start to see some interesting trends. As I move into more of a data science position it’s great to have a big dataset of data that’s interesting to me personally.
Learning is a lot more fun when you’re doing it to achieve a goal that interests you. So why not find interesting things to learn that can benefit you in your job?
Reminders
“Cortana, remind me on Saturday to ride the monorail.” I apparently use reminders on my phone so often that phrases like that come out of my three year old’s mouth. Whether you have iOS, Android, Windows Phone, or even Windows 10, you should get to know the reminder capabilities. On my phone I just hold down the search button, wait for Cortana to start listening and then tell her what I want to be reminded of. That reminder can be triggered at a specific time, a place (via GPS), or when I’m calling or texting someone. It took a little while to get used to using it, but now I’d have a hard time living without it. Our brains are terrible at remembering todo lists. Offload it to the computer in your pocket!
Azure IP Blocking
A few days ago I noticed that my website was getting a little sluggish. It wasn’t much but it wasn’t as snappy as usual. I checked out the website dashboard on Azure and noticed that the number of requests to my server were steadily rising. A quick scan of the logs revealed that a single IP was flooding my server with requests. It was trying to hit the URL where I used to serve up CascadeSkier data so I doubt it was an intentional DOS attack, but the effect was ramping up to be similar.
I wrote previously about blocking traffic to Azure websites, but new features are always getting added so I looked around again. Indeed they have added the ability to block specific IPs. Check out the final section of ScottGu’s blog post for more info.
It was a one line configuration file change and the results were immediately apparent. Even if you don’t understand any of the geek gibberish that I just wrote, I bet you can figure out when I implemented the change:
Ecobee3 Thermostat Review
For years I’ve had a thermostat project sitting on the back burner. The key features I wanted to build into it were:
- Ability to set the thermostat from our phones
- Support for remote sensors so the thermostat can use temperatures from around the house
- Logging of all the temperature sensors as well as the runtimes for the furnace, air conditioner and fan.
- Advanced programming capabilities such as: in the summer if it’s hot upstairs but cool downstairs, just run the fan more to circulate the air.
I’ve seen some Arduino-based custom thermostat projects and figured it was doable, but it never bubbled to the top of my list.
Procrastination paid off because now there are some great thermostats on the market and one of them hits almost all of my required features: the ecobee3.
The biggest thing that held me back from installing something like this before was the wiring to our thermostat. All of these new wifi models require a common (“C”) wire which provides power. There are some hacks you can do to make one of the wires perform double duty and the ecobee3 even comes with the kit, but I really wanted it done “correctly.” So when Dad and Mom were visiting last week, I descended into the crawl space while Dad helped from on top and we fished a new line through the walls. We left the old wire bundle in place so I now have 11 wires running to my thermostat (4 in the old bundle plus 7 in the new.) I’ll never need more than 5 but whatever, I’m future proof.
Setup was a breeze and it even told me that I had one wire connected incorrectly. The touchscreen on the thermostat walked me through connecting to the wifi and basic setup. I also connected the remote sensor and placed it upstairs in our bedroom.
I jumped onto the web interface and looked at the myriad of ways that I could customize the scheduling and also added the apps to our phones. (There is a beautiful Windows Phone app available and I’m sure they have Android and iOS too.) It’s so nice to program a thermostat by clicking around in a browser window instead of punching buttons on a little device. The default software has lots of nice features such as “use the upstairs thermometer to control things at night” and “run the fan extra when the difference between the two thermometers is greater than normal”.
Another big reason for buying this specific unit is that it has a nice API. I spent some of my free time in the next few days writing a program to download all of the data from the thermostat and upload it to my SQL Azure database. That database now has information about furnace, air conditioner, and fan runtimes as well as individual sensor temperature and humidity values. It’s all recorded every 5 minutes so I will have tons of data to play with. The API also means that I could theoretically do fancier things like write a program to text message us if we should open the windows in the house or turn the furnace back up when we drive within 15 minutes of the house after being gone.
At $240, it’s a chunk of money to pay for a thermostat but if you’re at all interested in tinkering with this stuff, it’s a great product. I think it will shave some money off our bills too so we’ll recoup some of that cost, but mostly it’s just fun!
Internet of Things
You may have heard about the “internet of things”, but what is it? At it’s core, it’s the idea that we can collect a lot of data about various parts of our lives with simple little devices. (IoT also includes the ability for the devices to perform operations, but I’m mostly interested in the data side for this post.) All those datda points may seem insignificant if you look at a single source for a single day, but if you start looking at these data streams over years and combine them with dozens of other data feeds, you can learn some really interesting things.
My main frustration is that all of these different devices are silos of information. I can’t take information from my fitbit and combine it with my GPS data from my phone or data from my spinkler. Why would I? Who knows, but that’s kind of the point. If you can’t get at these data sets, your ability to learn from them is severely limited.
Thankfully a lot of these devices have APIs available. I’ve started writing little programs that pull the data down to my computer and then dump them all into a single database. Right now I have tables that show the weather each day, how long my sprinklers were on, and the weight reported by our WiFi scale. We’re upgrading to a WiFi thermostat soon so I hope to have another table that shows how long our furnace was running. I’m still trying to figure out the best way to get per-circuit monitoring and logging for my electrical panel too.
Some day we’ll have a great service that combines all of these things for us, but until then, I’m hoarding the data. It’s a fun data science distraction every now and then.
Longer Days
The rain in Seattle doesn’t bother me. I find what affects me the most is the lack of sunshine. Woodinville is 6 degrees further north than my hometown in Indiana and that correlates to about 30-40 minutes less sunshine in the winter than I grew up with.
We all know that you get less sunshine in the winter than in the summer and that the farther north you go, the less sunshine you get in the winter, but did you know that there are more minutes in a winter day than in a summer day? A day is defined by the time it takes the same point on earth to directly face the sun again. We aren’t just rotating around the axis of the planet, we are also orbiting the sun and we are not always the same distance from the sun. Right now the solar system is set up so that we are closest to the sun in the winter. That means we move more faster around our orbit than we do when we are farther from the sun. If we’re moving faster, that means that the earth has to rotate a little bit extra to point back at the sun again. Our winter days have 30 “extra” seconds in them!
If you want to see this all explained in a quick video, check out Minute Physics:
Disaster Planning: Passwords
How many times have I written about making sure you have a good backup plan in place? I’m too lazy to look, but it’s a lot. I’ll summarize on the off chance that you don’t remember the plan:
Imagine that at this exact moment, your house explodes in a giant fireball. Nobody is injured but all of your computers are obliterated. Did you lose files? If the answer is yes then you’ve failed. Luckily, the remedy is very easy: sign up for an online backup service and run it on all your computers. I like crashplan.com but there are lots of other options. If you’re running that, no matter what happens to your physical machines, you can still get your files.
I saw two data loss situations happen in December and it reminded me of another thing to think about. Dad, I’ll use you as an example since it has a happy ending.
Dad has a good backup strategy in place. It’s not cloud backup because they’re in a limited bandwidth situation, but he does a good job with a physical offsite backup solution. While I was home at Christmas, his main hard drive died. The first thought is “Whew, at least I don’t have to worry about data loss.” But then when he went to restore from his backup drive, he realized that he didn’t (right away) remember his BitLocker password. The whole drive was encrypted so it was only useful if he had that master password. Thankfully he was able to get it and everything went smoothly from there.
But this is a good opportunity for us evaluate our own backup strategies. If you lose all your computers, can you still access/remember all your passwords? In our house we use lastpass.com, but there are other options. I’ve written about LastPass before. It’s a free tool that securely stores all of your passwords online and then you have one (very strong, very long) password that you use to access those passwords. I won’t go into details here, but it’s impossible to hack your account unless someone is able to guess your password. Tyla and I both know that master password so even if our house blows up and I’m in it, she’ll still be able to get to every password that we have.
Disaster planning can seem like a daunting task but if you just use a couple services (crashplan.com and lastpass.com), you’ll be covered in a huge range of situations.
Fantasy Football Super Stats
I wrote a tool for our fantasy football league that checks the rosters twice a week and sends out emails if you having anyone starting a game who is injured, on a bye week, or just questionable. That, along with lots of effort from each coach, has made our league pretty good in terms of starting full rosters every week.
I started playing around more with the Yahoo API and realized that I could download stats for every game. Not just that, I could download stats for all nine seasons that we’ve had this league! Give a data scientist a fun data set like this and see what happens. None of the stuff I did is very complicated in terms of real “data science”, but it sure was fun!
First of all, I have some corrections to make to the table of records. One of these is due to a stat correction from Yahoo and the other is because Yahoo’s online interface didn’t let me go back far enough to see the old record. Here is the updated table with changes in bold. It turns out that Tim didn’t break the record for highest score in a game or the highest player score in a game.
This Week | Season | All-Time | |
Highest Team Score | Ben had 151.18 | Tim 200.51 (Week 3) | Tim had 206.94 in 2008 |
Lowest Team Score | Andy had 75.31 | Andy had 41.29 (Week 14) | Andy had 41.29 (2015) |
Biggest Blowout | Luke beat Andy by 42.02 | Ben beat Dad by 111.43 (Week 8) | Luke beat Andy by 113.02 (2010) |
Closest Win | Austin beat Logan by 16.44 | Ben beat Andy by 2.46 (Week 7) | Jim beat Ben by 0.12 (2012) |
Highest Scoring Player | Kirk Cousins had 40.20 as a free agent. | Drew Brees had 58.30 on Tim’s bench (Week 8) | Peyton Manning had 60.28 for Andy (2013) |
Longest Active Winning Streak | Austin has a 4 game winning streak. | Ben had an 8 game winning streak (Week 8) | Micah (2011) and Ben (2015) had an 8 game winning streak |
Longest Active Losing Streak | Jim has a 4 game losing streak. | Andy had a 8 game losing streak (Week 14) | Kyle had a 14 game losing streak (2011) |
Ok, but that’s pretty boring. What else can we answer? How about who had the most points on their bench? Instead of just counting the total, let’s look at the percentage of the total team’s points that sat on the bench. This stat is mostly just a curiosity. You could have a lot of points on the bench because you started the wrong people or because your team was really strong.
Tim Scherschel | 36.0% |
Dallon Martens | 32.4% |
Ben Martens | 32.3% |
Logan Brandt | 32.1% |
League Total | 31.7% |
Luke | 31.2% |
Jim | 30.9% |
Austin | 30.0% |
Andy Daniels | 27.6% |
Which positions produce the most average points in a game?
QB | 22.3 |
WR | 13.2 |
RB | 11.4 |
DEF | 10.7 |
TE | 10.1 |
K | 8.3 |
I broke the all time record for most points in a season. But how does it compare to everyone else over the past seasons?
How does the average number of points per team vary by year? Does it all average out or is there more scoring in some years?
Which players have generated the most points since 2007 in our league?
Drew Brees | 3089.66 |
Tom Brady | 2940.34 |
Aaron Rodgers | 2825.2 |
Peyton Manning | 2602.78 |
Philip Rivers | 2300.68 |
Tony Romo | 2106.5 |
Ben Roethlisberger | 2061.72 |
Adrian Peterson | 2026.62 |
Eli Manning | 1969.96 |
Matt Ryan | 1900.98 |
Who were the top 3 players at each position in our league this year?
Tom Brady | Ben Martens | QB | 410.94 |
Russell Wilson | Luke | QB | 377.18 |
Cam Newton | Austin | QB | 376.46 |
Antonio Brown | Austin | WR | 296.93 |
Julio Jones | Logan Brandt | WR | 287.70 |
Odell Beckham Jr. | Ben Martens | WR | 264.00 |
Adrian Peterson | Jim | RB | 232.40 |
Devonta Freeman | Ben Martens | RB | 205.60 |
Doug Martin | Dallon Martens | RB | 201.50 |
Rob Gronkowski | Austin | TE | 216.94 |
Greg Olsen | Tim Scherschel | TE | 186.30 |
Tyler Eifert | Logan Brandt | TE | 152.40 |
Stephen Gostkowski | Dallon Martens | K | 167.00 |
Steven Hauschka | Tim Scherschel | K | 127.00 |
Mason Crosby | Logan Brandt | K | 116.00 |
Arizona | Dallon Martens | DEF | 200.86 |
Denver | Ben Martens | DEF | 196.94 |
Seattle | Tim Scherschel | DEF | 180.70 |
Who has had the easiest schedule for the history of our league?
Tim. Dad has had the hardest schedule. But really, they are all very similar. Nobody deviated more than 2% from the mean.
Who had the most close wins in the last four years? This is a count of the number of times that each person won by less than 10 points.
Jim | 8 |
Andy Daniels | 8 |
Logan Brandt | 7 |
Tim Scherschel | 7 |
Dallon Martens | 7 |
Austin | 5 |
Ben Martens | 5 |
Luke | 5 |
There are so many more questions on my list to answer, but frankly I’ve been spending a lot of late nights digging through this. Hopefully if I get this post out, I’ll stop for a while and go to sleep!
If you’re interested in this dataset, let me know. I can either ship it to you in Excel or I can give you access to the SQL database and you can run your own queries. I hope to keep it running next year as the season progresses. I think I’ll also be able to use it to generate the Weekly Awards table each week which will make my blog posts easier.
Thanks to everyone for a great season! Go Seahawks!