Studio711.com – Ben Martens

Geek

Fantasy Football Super Stats

fantasyfootballstatsI wrote a tool for our fantasy football league that checks the rosters twice a week and sends out emails if you having anyone starting a game who is injured, on a bye week, or just questionable. That, along with lots of effort from each coach, has made our league pretty good in terms of starting full rosters every week.

I started playing around more with the Yahoo API and realized that I could download stats for every game. Not just that, I could download stats for all nine seasons that we’ve had this league! Give a data scientist a fun data set like this and see what happens. None of the stuff I did is very complicated in terms of real “data science”, but it sure was fun!

First of all, I have some corrections to make to the table of records. One of these is due to a stat correction from Yahoo and the other is because Yahoo’s online interface didn’t let me go back far enough to see the old record. Here is the updated table with changes in bold. It turns out that Tim didn’t break the record for highest score in a game or the highest player score in a game.

This Week Season All-Time
Highest Team Score Ben had 151.18 Tim 200.51 (Week 3) Tim had 206.94 in 2008
Lowest Team Score Andy had 75.31 Andy had 41.29 (Week 14) Andy had 41.29 (2015)
Biggest Blowout Luke beat Andy by 42.02 Ben beat Dad by 111.43 (Week 8) Luke beat Andy by 113.02 (2010)
Closest Win Austin beat Logan by 16.44 Ben beat Andy by 2.46 (Week 7) Jim beat Ben by 0.12 (2012)
Highest Scoring Player Kirk Cousins had 40.20 as a free agent. Drew Brees had 58.30 on Tim’s bench (Week 8) Peyton Manning had 60.28 for Andy (2013)
Longest Active Winning Streak Austin has a 4 game winning streak. Ben had an 8 game winning streak (Week 8) Micah (2011) and Ben (2015) had an 8 game winning streak
Longest Active Losing Streak Jim has a 4 game losing streak. Andy had a 8 game losing streak (Week 14) Kyle had a 14 game losing streak (2011)

Ok, but that’s pretty boring. What else can we answer? How about who had the most points on their bench? Instead of just counting the total, let’s look at the percentage of the total team’s points that sat on the bench. This stat is mostly just a curiosity. You could have a lot of points on the bench because you started the wrong people or because your team was really strong.

Tim Scherschel 36.0%
Dallon Martens 32.4%
Ben Martens 32.3%
Logan Brandt 32.1%
League Total 31.7%
Luke 31.2%
Jim 30.9%
Austin 30.0%
Andy Daniels 27.6%

Which positions produce the most average points in a game?

QB 22.3
WR 13.2
RB 11.4
DEF 10.7
TE 10.1
K 8.3

I broke the all time record for most points in a season. But how does it compare to everyone else over the past seasons?

scoresbyyearbycoach

How does the average number of points per team vary by year? Does it all average out or is there more scoring in some years?

Which players have generated the most points since 2007 in our league?

Drew Brees 3089.66
Tom Brady 2940.34
Aaron Rodgers 2825.2
Peyton Manning 2602.78
Philip Rivers 2300.68
Tony Romo 2106.5
Ben Roethlisberger 2061.72
Adrian Peterson 2026.62
Eli Manning 1969.96
Matt Ryan 1900.98

Who were the top 3 players at each position in our league this year?

Tom Brady Ben Martens QB 410.94
Russell Wilson Luke QB 377.18
Cam Newton Austin QB 376.46
Antonio Brown Austin WR 296.93
Julio Jones Logan Brandt WR 287.70
Odell Beckham Jr. Ben Martens WR 264.00
Adrian Peterson Jim RB 232.40
Devonta Freeman Ben Martens RB 205.60
Doug Martin Dallon Martens RB 201.50
Rob Gronkowski Austin TE 216.94
Greg Olsen Tim Scherschel TE 186.30
Tyler Eifert Logan Brandt TE 152.40
Stephen Gostkowski Dallon Martens K 167.00
Steven Hauschka Tim Scherschel K 127.00
Mason Crosby Logan Brandt K 116.00
Arizona Dallon Martens DEF 200.86
Denver Ben Martens DEF 196.94
Seattle Tim Scherschel DEF 180.70

Who has had the easiest schedule for the history of our league?
Tim. Dad has had the hardest schedule. But really, they are all very similar. Nobody deviated more than 2% from the mean.

Who had the most close wins in the last four years? This is a count of the number of times that each person won by less than 10 points.

Jim 8
Andy Daniels 8
Logan Brandt 7
Tim Scherschel 7
Dallon Martens 7
Austin 5
Ben Martens 5
Luke 5

There are so many more questions on my list to answer, but frankly I’ve been spending a lot of late nights digging through this. Hopefully if I get this post out, I’ll stop for a while and go to sleep!

If you’re interested in this dataset, let me know. I can either ship it to you in Excel or I can give you access to the SQL database and you can run your own queries. I hope to keep it running next year as the season progresses. I think I’ll also be able to use it to generate the Weekly Awards table each week which will make my blog posts easier.

Thanks to everyone for a great season! Go Seahawks!

Flash Fill

If you spend much time in Excel, you have probably found yourself doing some repetitive tasks to clean up a column of data. Or maybe instead of doing it manually, you might craft some formulas to do it for you. If you’re just doing a one-off cleanup, make sure you consider the Flash Fill feature in Excel. I recently used this in a demo to my team at work and many people in the room were surprised by it so I figured there were probably a few of you reading this who could benefit from it as well.

The basic idea is that you have one or more input columns and an output column. You provide an example or two of what the output should be and then you click the Flash Fill button. A magical wizard in your computer figures out the pattern and repeats it for the rest of the column. It really does feel like magic the first couple times you use it. Sometimes the algorithm gets it wrong for a few cells so you can just make a correction and it will update it’s algorithm and redo any cells that need it.

This video has a bunch of great examples:

If that’s not enough for you, check out prose-playground.cloudapp.net. The tech behind Flash Fill is open sourced out of Microsoft Research and this is one of the sample sites they have created. If you want to use the SDK, you can get it from GitHub.

Working With ISOs

isofileWe don’t get a lot of software on discs anymore, but it does still happen. What do you do when you need to install that software on a computer without a DVD drive? You could buy a USB DVD drive but that’s a pain to hook up. Instead, consider turning that DVD disc into an “ISO” file. It’s a single file that fully represents the disc. With modern versions of Windows, you can “mount” the ISO file and Windows treats that file just like it would if you had put the actual disc into the computer. If your computer doesn’t have this mounting feature, there are plenty of free software solutions. Virtual Clone Drive is a great one.

How do you make an ISO? Grab a free copy of ImgBurn. It has tons of options including making an ISO from a disc and burning an ISO back to a disc.

This might seem slightly geeky but it’s a great way to get rid of all those discs sitting in the closet. Just rip them to ISO files and then toss the discs. You can easily access them from any computer in your house and if you really need a disc again, you can burn it.

Cloud Computing Job Description

datacenter(Note: I generally try to avoid specifically mentioning my company name because then I get picked up by search engines and news articles and incorrectly quoted as an official company resource. Let me be clear that none of this is an official company statement. I’m just explaining things from my current, personal viewpoint.)

I’ve worked in some interesting groups in Microsoft. The most recent was probably the easiest to explain. I’d just tell people to go to powerbi.com and they could either watch a couple videos or even sign up for a free account. Now I’ve made the switch to Azure Compute and I’m in more of a data scientist role so I thought I’d take some time to explain what that all means.

Azure and Cloud Computing
Azure is Microsoft’s cloud offering. It’s a direct competitor to Amazon Web Services (AWS). It’s hard to know who is bigger because these companies don’t publish lots of numbers, but it’s probably safe to assume that AWS has more customers than Azure. Watch for that to change in the future though. Azure is growing by leaps and bounds.

That’s great, but what is “the cloud”? In the past, businesses have operated their own data centers. It’s generally a building full of racks and racks of computers with blinky lights and cables all over the place. Each computer is specially built for a special purpose and maybe there’s an identical one sitting right next to it in case the first one fails. There is a huge overhead cost of the building (including cooling, electricity, maintenance, etc) as well as the need to amortize the cost of all the equipment over many years. Companies often end up running on old equipment because they’re still trying to pay it off. And oh yeah, even if you can get a datacenter running smoothly and paying for itself, if you need to be concerned about compliance, certifications and international law, be prepared to create a new data center in a variety of countries or even in specific countries. There are all kinds of laws that say, for example, you cannot move user data out of Europe if it was created there. How’s that going to work if your only datacenter is in the United States? There have been huge international changes in this area after all the Snowden revelations. And oh yeah, even if you do somehow manage to build out a global network of datacenters, some countries won’t let you do business there unless they run the datacenter themselves! You don’t even get a key to the building that you paid to construct. It quickly becomes a nightmare to manage on your own.

Cloud computing changes that whole model. Microsoft, Amazon, and others are building enormous data centers around the world. Instead of designing and buying all of your own equipment, you just rent time from the provider. In Microsoft’s case, that is the Azure service. There are many different products inside of the Azure world, but the basic premise is that you no longer need to think about the physical hardware. You just tell Microsoft what reliability and scalability you want. Do you want to store data in one part of the world but have it automatically mirrored on the other side of the world? Click a box. Do you want to change from using a single CPU machine to one with 16 CPUs? Click a box. Do you want to expand from 10 computers to 1000? Click a box. Or you can even completely ignore what kind of computer is being used and just publish your web application to Azure. Azure will then help you automatically scale based on load and automatically failover when underlying hardware has issues. It’s a big mind shift and it’s taking time for some companies to wrap their heads around it. At this point you basically have people in the old mindset who are still running their own data centers and are trying to pay off the cost over 10-20 years while their competitors are using the latest cloud technology and are just paying for what they use. Cloud computing costs a fraction of what a full, custom datacenter would cost and you’ll get eaten for lunch if your company doesn’t switch over because your competitors will be running so much faster and cheaper than you can.

Azure Compute
I’m in the “Azure Compute” team. It’s one of the fundamental building blocks of Azure. This team provides the virtual computers directly to customers and also to all of the other Azure products. If you need computing power in Azure, it comes through this team. As an example, there is a virtual machine in Azure hosting the MySQL database that powers this website.

Data Scientist
And finally, my title has changed from Software Engineer to Data Scientist. The “data science” term is a buzzword right now and I always ask people to define it when they use it because everyone means something different. But very generally, what it means on this team is that we take all of the telemetry data that we get from the system and we dig through it, analyze it, and run machine learning algorithms on top of it to find problems and make sure our customers are happily getting what they pay for. There’s also a lot of “data engineering” thrown in because in order to analyze petabytes of information, you have to do some serious work to get all that data moved into the right systems and curated into a format that makes sense. And once you get it all working, you have to keep it working.

This area of computer science has been around for a long time but it’s getting really hot right now. Technology has advanced to the point where storage and computing power are almost free (because of cloud computing) and everything around us is logging data. Depending on which report you read, the total amount of data on the planet doubles every year or two. And the rate at which it doubles is rapidly increasing. All of this data on it’s own is pointless so it’s the job of people like me to dig through it and extract actionable information and value from it. It’s an exciting area of the field and I’m really happy to be doing it for Microsoft Azure!

Office 16

powerqueryexcel2016The next version of Office is now public. “But Ben, I already have Office and I hate buying new copies of it.” Never fear! Do you have an Office 365 subscription? If so then you get this upgrade for FREE. And if you don’t have a subscription, this is a good reason to consider it. Unless you have a single computer in your house and you never upgrade Office, the subscription is a great deal. You get five installs with free upgrades, 1TB on One Drive, and some other perks for  100/year. There’s no more worrying about whether or not the upgrade is worth it for you. Pretty slick!

I normally don’t get too worked up about Office upgrades, but I’m really excited about Excel 2016. You may remember that I spent the vast majority of my time at Microsoft on Power Query and the projects that led up to it. Power Query now ships as a core feature of Excel 2016! Click the Data tab and look in the “Get & Transform” section. It’s awesome to have my code running on hundreds of millions (or billions?) of desktops. I know that it’s not a feature that most people will use, but it’s still cool to be included.

CrashPlan Update

crashplan-logoIn 2011 I signed up for CrashPlan and I’ve been a big advocate ever since. It’s a fantastic cloud backup service. You just install it on your machine, pay a small annual fee, tell it what folders to watch and you’re protected! There’s no limit on how much data you can upload. I have 3.9TB uploaded now!

As the size of my backups have increased, there have been two times when the app has stopped working. Both times were because the app was running out of memory to look at all the files and figure out what had changed. There’s an easy fix which is well-documented on the CrashPlan site. You just edit an INI file and restart the service.

If you’re not doing any kind off site backup yet, go to crashplan.com and sign up. It’s a tiny amount of money to be protected against all kinds of problems and data loss disasters.

Moving From Groove To Spotify

spotifylogoTyla and I have been enjoying our Groove (Xbox Music) subscriptions. They work well on our phones, computers, Xboxes, etc. I have three main complaints about the service:

  1. I use it to play music through our whole-house FM transmitter. It crashes/stops pretty frequently and I have to restart it about once per day.
  2. There is no sharing of playlists. This is a huge negative for me. Sometimes I want to make my own lists, but very often I just want a curated list that somebody else generated. Or maybe I want to use the playlist that Tyla made of Elijah’s favorite songs. There’s no way to import/export or share playlists in Groove.
  3. There’s no family plan. Everyone buy’s their own subscription at full price.

With these thoughts in mind, I think we’re going to join the rest of the planet and switch to Spotify when our current subscriptions are up later this year. The first family member pays full price and additional family members are half price. It’s easy to switch but the only real pain point is moving our playlists over to Spotify. Manually recreating them would take forever.

There’s no great way to do this, but if you’re a geek, you can probably follow my vague instructions:

  1. Fire up Chrome. Browse to music.windows.com and open your playlist.
  2. Scroll down to make sure the whole playlist loads (it only grabs 50 songs at a time.)
  3. Press F12 to open the developer tools
  4. In the Elements tab, right click on the root HTML node and choose Copy.
  5. Paste that into notepad and save the file.
  6. Now you need to parse out the artist and song info. There are tons of possibilities here, but I used Power Query in Excel. Basically I filtered for the types of rows that had Artist and Titles. I stripped out all the extra info and voila. I built out a table with two columns: Artist and Title.
  7. I copied that table and went to Ivy. That site lets you create a Spotify playlist. It looks up each one of those songs in the Spotify catalog. You’re probably not going to find all of your songs in the Spotify catalog since the catalogs are different, but most of them will probably work.
  8. Ivy gives you a button to click that copies the playlist to your clipboard. Then you create a new playlist in Spotify and paste your clipboard into that playlist. Voila!

Easy? Nope. Hacky? Yep. But it saved me a bunch of time. There is an SDK for Groove but it doesn’t look like you can easily get playlist contents for a specific user. I started trying to code it up, but this was faster and I’ll never need to do it again anyway.

Hyperlapse

A while back, Microsoft Research announced a technology called Hyperlapse. The idea is that it takes something like bumpy GoPro video, smooths it out and speeds it up. I didn’t play around with it much until the other day when I installed a preview of the professional desktop application. I dropped in a ski video from a 2013 trip to Crystal with AndyM. The result was pretty impressive! Sure it’s not going to make a great ski video, but it’s cool to see the “flow” down the mountain. It almost looks like the camera was flying above the ground.

Long Timelapse Tips

GoPro-HD-Hero-2During our Indiana trip, I made a six-day timelapse. That was by far the longest one I’ve ever attempted. How did it work? Here are some tips:

  • The GoPro is a great device for this. You can set it to snap a photo every 2, 5 or 10 seconds. Generally I do 2 seconds but since this one was so long I did 5 seconds since I knew I would speed it up a lot anyway.
  • Set the GoPro to the lowest resolution possible. On my camera that is 5 megapixels. Even that is a waste since you’re going to end up resizing down to 1080p anyway. The lower resolution will help you save room on the memory card and also saves battery life.
  • Don’t skimp on the memory card. 32 and 64GB cards are super cheap. Just get a big one and remove this potential problem from your setup.
  • Get a portable USB battery pack. I use this one, but there are lots of options. Get one that takes AA batteries. When you run out of battery, you just slap new batteries in there and you’re instantly ready to go instead of waiting for a recharge. I use and recommend Eneloop AA’s and the Lacrosse BC-700 charger. By using the battery pack to keep the internal GoPro battery charged up, I’m able to confidently let the camera run for 4 hours and it will probably go a lot longer. I changed out the batteries at lunch and at night. That way I didn’t have to worry about them dying on me.

When I was done I had over 30,000 photos. How do you process all of those into a movie? There are LOTS of options, but here’s how I do it (for free):

  • Install IrfanView. It has a lot of features, but I use it to batch resize and crop all of the photos down to 1920×1080. I also rename the photos to something like image#####.jpg so that there is a sequential number for every image.
  • Install ffmpeg. Decipher the command line parameters to build up the video that you want. Here’s the line that I use:”c:\program files\ffmpeg\bin\ffmpeg” -i image%05d.JPG -r 30 -s hd1080 -vcodec libx264 -y out.mp4
  • Techincally you could be done at this point, but I usually take that MP4 into Adobe Premiere Elements and crop it up, speed it up, slow it down, etc.

Software Toolkit

toolsIt’s been a few weeks since I last wiped my computer and started with a fresh install. When I do this, I only install software as I need it, so at this point, if I have something on my machine, it’s because I use it pretty regularly. I thought it might be interesting to share the list as a bit of a recommendation/tip post. I’ll leave out specific apps for my printer, scanner, etc.

  • 7-Zip
  • Adobe Photoshop Elements 10
  • Adobo Photoshop Lightroom 3.6
  • Adobe Premiere Elements 12
  • Foxit Reader
  • Google Chrome
  • HandBrake
  • LastPass
  • Microsoft Azure SDK v2.7
  • Microsoft Office Professional Plus 2013
  • Microsoft Power Query for Excel
  • Microsoft SQL Server Management Studio
  • Microsoft Visual Studio Ultimate 2013
  • MySQL Workbench
  • Notepad++
  • Python 2.7.6
  • SketchUp 2015
  • VLC media player