Studio711.com – Ben Martens

Geek

Ecobee3 Thermostat Review

ecobee3For years I’ve had a thermostat project sitting on the back burner. The key features I wanted to build into it were:

  1. Ability to set the thermostat from our phones
  2. Support for remote sensors so the thermostat can use temperatures from around the house
  3. Logging of all the temperature sensors as well as the runtimes for the furnace, air conditioner and fan.
  4. Advanced programming capabilities such as: in the summer if it’s hot upstairs but cool downstairs, just run the fan more to circulate the air.

I’ve seen some Arduino-based custom thermostat projects and figured it was doable, but it never bubbled to the top of my list.

Procrastination paid off because now there are some great thermostats on the market and one of them hits almost all of my required features: the ecobee3.

The biggest thing that held me back from installing something like this before was the wiring to our thermostat. All of these new wifi models require a common (“C”) wire which provides power. There are some hacks you can do to make one of the wires perform double duty and the ecobee3 even comes with the kit, but I really wanted it done “correctly.” So when Dad and Mom were visiting last week, I descended into the crawl space while Dad helped from on top and we fished a new line through the walls. We left the old wire bundle in place so I now have 11 wires running to my thermostat (4 in the old bundle plus 7 in the new.) I’ll never need more than 5 but whatever, I’m future proof.

Setup was a breeze and it even told me that I had one wire connected incorrectly. The touchscreen on the thermostat walked me through connecting to the wifi and basic setup. I also connected the remote sensor and placed it upstairs in our bedroom.

I jumped onto the web interface and looked at the myriad of ways that I could customize the scheduling and also added the apps to our phones. (There is a beautiful Windows Phone app available and I’m sure they have Android and iOS too.) It’s so nice to program a thermostat by clicking around in a browser window instead of punching buttons on a little device. The default software has lots of nice features such as “use the upstairs thermometer to control things at night” and “run the fan extra when the difference between the two thermometers is greater than normal”.

Another big reason for buying this specific unit is that it has a nice API. I spent some of my free time in the next few days writing a program to download all of the data from the thermostat and upload it to my SQL Azure database. That database now has information about furnace, air conditioner, and fan runtimes as well as individual sensor temperature and humidity values. It’s all recorded every 5 minutes so I will have tons of data to play with. The API also means that I could theoretically do fancier things like write a program to text message us if we should open the windows in the house or turn the furnace back up when we drive within 15 minutes of the house after being gone.

At $240, it’s a chunk of money to pay for a thermostat but if you’re at all interested in tinkering with this stuff, it’s a great product. I think it will shave some money off our bills too so we’ll recoup some of that cost, but mostly it’s just fun!

Internet of Things

IoT-GraphicYou may have heard about the “internet of things”, but what is it? At it’s core, it’s the idea that we can collect a lot of data about various parts of our lives with simple little devices. (IoT also includes the ability for the devices to perform operations, but I’m mostly interested in the data side for this post.) All those datda points may seem insignificant if you look at a single source for a single day, but if you start looking at these data streams over years and combine them with dozens of other data feeds, you can learn some really interesting things.

My main frustration is that all of these different devices are silos of information. I can’t take information from my fitbit and combine it with my GPS data from my phone or data from my spinkler. Why would I? Who knows, but that’s kind of the point. If you can’t get at these data sets, your ability to learn from them is severely limited.

Thankfully a lot of these devices have APIs available. I’ve started writing little programs that pull the data down to my computer and then dump them all into a single database. Right now I have tables that show the weather each day, how long my sprinklers were on, and the weight reported by our WiFi scale. We’re upgrading to a WiFi thermostat soon so I hope to have another table that shows how long our furnace was running. I’m still trying to figure out the best way to get per-circuit monitoring and logging for my electrical panel too.

Some day we’ll have a great service that combines all of these things for us, but until then, I’m hoarding the data. It’s a fun data science distraction every now and then.

Longer Days

perihelionThe rain in Seattle doesn’t bother me. I find what affects me the most is the lack of sunshine. Woodinville is 6 degrees further north than my hometown in Indiana and that correlates to about 30-40 minutes less sunshine in the winter than I grew up with.

We all know that you get less sunshine in the winter than in the summer and that the farther north you go, the less sunshine you get in the winter, but did you know that there are more minutes in a winter day than in a summer day? A day is defined by the time it takes the same point on earth to directly face the sun again. We aren’t just rotating around the axis of the planet, we are also orbiting the sun and we are not always the same distance from the sun. Right now the solar system is set up so that we are closest to the sun in the winter. That means we move more faster around our orbit than we do when we are farther from the sun. If we’re moving faster, that means that the earth has to rotate a little bit extra to point back at the sun again. Our winter days have 30 “extra” seconds in them!

If you want to see this all explained in a quick video, check out Minute Physics:

Disaster Planning: Passwords

disasterplanHow many times have I written about making sure you have a good backup plan in place? I’m too lazy to look, but it’s a lot. I’ll summarize on the off chance that you don’t remember the plan:

Imagine that at this exact moment, your house explodes in a giant fireball. Nobody is injured but all of your computers are obliterated. Did you lose files? If the answer is yes then you’ve failed. Luckily, the remedy is very easy: sign up for an online backup service and run it on all your computers. I like crashplan.com but there are lots of other options. If you’re running that, no matter what happens to your physical machines, you can still get your files.

I saw two data loss situations happen in December and it reminded me of another thing to think about. Dad, I’ll use you as an example since it has a happy ending.

Dad has a good backup strategy in place. It’s not cloud backup because they’re in a limited bandwidth situation, but he does a good job with a physical offsite backup solution. While I was home at Christmas, his main hard drive died. The first thought is “Whew, at least I don’t have to worry about data loss.” But then when he went to restore from his backup drive, he realized that he didn’t (right away) remember his BitLocker password. The whole drive was encrypted so it was only useful if he had that master password. Thankfully he was able to get it and everything went smoothly from there.

But this is a good opportunity for us evaluate our own backup strategies. If you lose all your computers, can you still access/remember all your passwords? In our house we use lastpass.com, but there are other options. I’ve written about LastPass before. It’s a free tool that securely stores all of your passwords online and then you have one (very strong, very long) password that you use to access those passwords. I won’t go into details here, but it’s impossible to hack your account unless someone is able to guess your password. Tyla and I both know that master password so even if our house blows up and I’m in it, she’ll still be able to get to every password that we have.

Disaster planning can seem like a daunting task but if you just use a couple services (crashplan.com and lastpass.com), you’ll be covered in a huge range of situations.

Fantasy Football Super Stats

fantasyfootballstatsI wrote a tool for our fantasy football league that checks the rosters twice a week and sends out emails if you having anyone starting a game who is injured, on a bye week, or just questionable. That, along with lots of effort from each coach, has made our league pretty good in terms of starting full rosters every week.

I started playing around more with the Yahoo API and realized that I could download stats for every game. Not just that, I could download stats for all nine seasons that we’ve had this league! Give a data scientist a fun data set like this and see what happens. None of the stuff I did is very complicated in terms of real “data science”, but it sure was fun!

First of all, I have some corrections to make to the table of records. One of these is due to a stat correction from Yahoo and the other is because Yahoo’s online interface didn’t let me go back far enough to see the old record. Here is the updated table with changes in bold. It turns out that Tim didn’t break the record for highest score in a game or the highest player score in a game.

This Week Season All-Time
Highest Team Score Ben had 151.18 Tim 200.51 (Week 3) Tim had 206.94 in 2008
Lowest Team Score Andy had 75.31 Andy had 41.29 (Week 14) Andy had 41.29 (2015)
Biggest Blowout Luke beat Andy by 42.02 Ben beat Dad by 111.43 (Week 8) Luke beat Andy by 113.02 (2010)
Closest Win Austin beat Logan by 16.44 Ben beat Andy by 2.46 (Week 7) Jim beat Ben by 0.12 (2012)
Highest Scoring Player Kirk Cousins had 40.20 as a free agent. Drew Brees had 58.30 on Tim’s bench (Week 8) Peyton Manning had 60.28 for Andy (2013)
Longest Active Winning Streak Austin has a 4 game winning streak. Ben had an 8 game winning streak (Week 8) Micah (2011) and Ben (2015) had an 8 game winning streak
Longest Active Losing Streak Jim has a 4 game losing streak. Andy had a 8 game losing streak (Week 14) Kyle had a 14 game losing streak (2011)

Ok, but that’s pretty boring. What else can we answer? How about who had the most points on their bench? Instead of just counting the total, let’s look at the percentage of the total team’s points that sat on the bench. This stat is mostly just a curiosity. You could have a lot of points on the bench because you started the wrong people or because your team was really strong.

Tim Scherschel 36.0%
Dallon Martens 32.4%
Ben Martens 32.3%
Logan Brandt 32.1%
League Total 31.7%
Luke 31.2%
Jim 30.9%
Austin 30.0%
Andy Daniels 27.6%

Which positions produce the most average points in a game?

QB 22.3
WR 13.2
RB 11.4
DEF 10.7
TE 10.1
K 8.3

I broke the all time record for most points in a season. But how does it compare to everyone else over the past seasons?

scoresbyyearbycoach

How does the average number of points per team vary by year? Does it all average out or is there more scoring in some years?

Which players have generated the most points since 2007 in our league?

Drew Brees 3089.66
Tom Brady 2940.34
Aaron Rodgers 2825.2
Peyton Manning 2602.78
Philip Rivers 2300.68
Tony Romo 2106.5
Ben Roethlisberger 2061.72
Adrian Peterson 2026.62
Eli Manning 1969.96
Matt Ryan 1900.98

Who were the top 3 players at each position in our league this year?

Tom Brady Ben Martens QB 410.94
Russell Wilson Luke QB 377.18
Cam Newton Austin QB 376.46
Antonio Brown Austin WR 296.93
Julio Jones Logan Brandt WR 287.70
Odell Beckham Jr. Ben Martens WR 264.00
Adrian Peterson Jim RB 232.40
Devonta Freeman Ben Martens RB 205.60
Doug Martin Dallon Martens RB 201.50
Rob Gronkowski Austin TE 216.94
Greg Olsen Tim Scherschel TE 186.30
Tyler Eifert Logan Brandt TE 152.40
Stephen Gostkowski Dallon Martens K 167.00
Steven Hauschka Tim Scherschel K 127.00
Mason Crosby Logan Brandt K 116.00
Arizona Dallon Martens DEF 200.86
Denver Ben Martens DEF 196.94
Seattle Tim Scherschel DEF 180.70

Who has had the easiest schedule for the history of our league?
Tim. Dad has had the hardest schedule. But really, they are all very similar. Nobody deviated more than 2% from the mean.

Who had the most close wins in the last four years? This is a count of the number of times that each person won by less than 10 points.

Jim 8
Andy Daniels 8
Logan Brandt 7
Tim Scherschel 7
Dallon Martens 7
Austin 5
Ben Martens 5
Luke 5

There are so many more questions on my list to answer, but frankly I’ve been spending a lot of late nights digging through this. Hopefully if I get this post out, I’ll stop for a while and go to sleep!

If you’re interested in this dataset, let me know. I can either ship it to you in Excel or I can give you access to the SQL database and you can run your own queries. I hope to keep it running next year as the season progresses. I think I’ll also be able to use it to generate the Weekly Awards table each week which will make my blog posts easier.

Thanks to everyone for a great season! Go Seahawks!

Flash Fill

If you spend much time in Excel, you have probably found yourself doing some repetitive tasks to clean up a column of data. Or maybe instead of doing it manually, you might craft some formulas to do it for you. If you’re just doing a one-off cleanup, make sure you consider the Flash Fill feature in Excel. I recently used this in a demo to my team at work and many people in the room were surprised by it so I figured there were probably a few of you reading this who could benefit from it as well.

The basic idea is that you have one or more input columns and an output column. You provide an example or two of what the output should be and then you click the Flash Fill button. A magical wizard in your computer figures out the pattern and repeats it for the rest of the column. It really does feel like magic the first couple times you use it. Sometimes the algorithm gets it wrong for a few cells so you can just make a correction and it will update it’s algorithm and redo any cells that need it.

This video has a bunch of great examples:

If that’s not enough for you, check out prose-playground.cloudapp.net. The tech behind Flash Fill is open sourced out of Microsoft Research and this is one of the sample sites they have created. If you want to use the SDK, you can get it from GitHub.

Working With ISOs

isofileWe don’t get a lot of software on discs anymore, but it does still happen. What do you do when you need to install that software on a computer without a DVD drive? You could buy a USB DVD drive but that’s a pain to hook up. Instead, consider turning that DVD disc into an “ISO” file. It’s a single file that fully represents the disc. With modern versions of Windows, you can “mount” the ISO file and Windows treats that file just like it would if you had put the actual disc into the computer. If your computer doesn’t have this mounting feature, there are plenty of free software solutions. Virtual Clone Drive is a great one.

How do you make an ISO? Grab a free copy of ImgBurn. It has tons of options including making an ISO from a disc and burning an ISO back to a disc.

This might seem slightly geeky but it’s a great way to get rid of all those discs sitting in the closet. Just rip them to ISO files and then toss the discs. You can easily access them from any computer in your house and if you really need a disc again, you can burn it.

Cloud Computing Job Description

datacenter(Note: I generally try to avoid specifically mentioning my company name because then I get picked up by search engines and news articles and incorrectly quoted as an official company resource. Let me be clear that none of this is an official company statement. I’m just explaining things from my current, personal viewpoint.)

I’ve worked in some interesting groups in Microsoft. The most recent was probably the easiest to explain. I’d just tell people to go to powerbi.com and they could either watch a couple videos or even sign up for a free account. Now I’ve made the switch to Azure Compute and I’m in more of a data scientist role so I thought I’d take some time to explain what that all means.

Azure and Cloud Computing
Azure is Microsoft’s cloud offering. It’s a direct competitor to Amazon Web Services (AWS). It’s hard to know who is bigger because these companies don’t publish lots of numbers, but it’s probably safe to assume that AWS has more customers than Azure. Watch for that to change in the future though. Azure is growing by leaps and bounds.

That’s great, but what is “the cloud”? In the past, businesses have operated their own data centers. It’s generally a building full of racks and racks of computers with blinky lights and cables all over the place. Each computer is specially built for a special purpose and maybe there’s an identical one sitting right next to it in case the first one fails. There is a huge overhead cost of the building (including cooling, electricity, maintenance, etc) as well as the need to amortize the cost of all the equipment over many years. Companies often end up running on old equipment because they’re still trying to pay it off. And oh yeah, even if you can get a datacenter running smoothly and paying for itself, if you need to be concerned about compliance, certifications and international law, be prepared to create a new data center in a variety of countries or even in specific countries. There are all kinds of laws that say, for example, you cannot move user data out of Europe if it was created there. How’s that going to work if your only datacenter is in the United States? There have been huge international changes in this area after all the Snowden revelations. And oh yeah, even if you do somehow manage to build out a global network of datacenters, some countries won’t let you do business there unless they run the datacenter themselves! You don’t even get a key to the building that you paid to construct. It quickly becomes a nightmare to manage on your own.

Cloud computing changes that whole model. Microsoft, Amazon, and others are building enormous data centers around the world. Instead of designing and buying all of your own equipment, you just rent time from the provider. In Microsoft’s case, that is the Azure service. There are many different products inside of the Azure world, but the basic premise is that you no longer need to think about the physical hardware. You just tell Microsoft what reliability and scalability you want. Do you want to store data in one part of the world but have it automatically mirrored on the other side of the world? Click a box. Do you want to change from using a single CPU machine to one with 16 CPUs? Click a box. Do you want to expand from 10 computers to 1000? Click a box. Or you can even completely ignore what kind of computer is being used and just publish your web application to Azure. Azure will then help you automatically scale based on load and automatically failover when underlying hardware has issues. It’s a big mind shift and it’s taking time for some companies to wrap their heads around it. At this point you basically have people in the old mindset who are still running their own data centers and are trying to pay off the cost over 10-20 years while their competitors are using the latest cloud technology and are just paying for what they use. Cloud computing costs a fraction of what a full, custom datacenter would cost and you’ll get eaten for lunch if your company doesn’t switch over because your competitors will be running so much faster and cheaper than you can.

Azure Compute
I’m in the “Azure Compute” team. It’s one of the fundamental building blocks of Azure. This team provides the virtual computers directly to customers and also to all of the other Azure products. If you need computing power in Azure, it comes through this team. As an example, there is a virtual machine in Azure hosting the MySQL database that powers this website.

Data Scientist
And finally, my title has changed from Software Engineer to Data Scientist. The “data science” term is a buzzword right now and I always ask people to define it when they use it because everyone means something different. But very generally, what it means on this team is that we take all of the telemetry data that we get from the system and we dig through it, analyze it, and run machine learning algorithms on top of it to find problems and make sure our customers are happily getting what they pay for. There’s also a lot of “data engineering” thrown in because in order to analyze petabytes of information, you have to do some serious work to get all that data moved into the right systems and curated into a format that makes sense. And once you get it all working, you have to keep it working.

This area of computer science has been around for a long time but it’s getting really hot right now. Technology has advanced to the point where storage and computing power are almost free (because of cloud computing) and everything around us is logging data. Depending on which report you read, the total amount of data on the planet doubles every year or two. And the rate at which it doubles is rapidly increasing. All of this data on it’s own is pointless so it’s the job of people like me to dig through it and extract actionable information and value from it. It’s an exciting area of the field and I’m really happy to be doing it for Microsoft Azure!

Office 16

powerqueryexcel2016The next version of Office is now public. “But Ben, I already have Office and I hate buying new copies of it.” Never fear! Do you have an Office 365 subscription? If so then you get this upgrade for FREE. And if you don’t have a subscription, this is a good reason to consider it. Unless you have a single computer in your house and you never upgrade Office, the subscription is a great deal. You get five installs with free upgrades, 1TB on One Drive, and some other perks for  100/year. There’s no more worrying about whether or not the upgrade is worth it for you. Pretty slick!

I normally don’t get too worked up about Office upgrades, but I’m really excited about Excel 2016. You may remember that I spent the vast majority of my time at Microsoft on Power Query and the projects that led up to it. Power Query now ships as a core feature of Excel 2016! Click the Data tab and look in the “Get & Transform” section. It’s awesome to have my code running on hundreds of millions (or billions?) of desktops. I know that it’s not a feature that most people will use, but it’s still cool to be included.

CrashPlan Update

crashplan-logoIn 2011 I signed up for CrashPlan and I’ve been a big advocate ever since. It’s a fantastic cloud backup service. You just install it on your machine, pay a small annual fee, tell it what folders to watch and you’re protected! There’s no limit on how much data you can upload. I have 3.9TB uploaded now!

As the size of my backups have increased, there have been two times when the app has stopped working. Both times were because the app was running out of memory to look at all the files and figure out what had changed. There’s an easy fix which is well-documented on the CrashPlan site. You just edit an INI file and restart the service.

If you’re not doing any kind off site backup yet, go to crashplan.com and sign up. It’s a tiny amount of money to be protected against all kinds of problems and data loss disasters.