(Note: I generally try to avoid specifically mentioning my company name because then I get picked up by search engines and news articles and incorrectly quoted as an official company resource. Let me be clear that none of this is an official company statement. I’m just explaining things from my current, personal viewpoint.)
I’ve worked in some interesting groups in Microsoft. The most recent was probably the easiest to explain. I’d just tell people to go to powerbi.com and they could either watch a couple videos or even sign up for a free account. Now I’ve made the switch to Azure Compute and I’m in more of a data scientist role so I thought I’d take some time to explain what that all means.
Azure and Cloud Computing
Azure is Microsoft’s cloud offering. It’s a direct competitor to Amazon Web Services (AWS). It’s hard to know who is bigger because these companies don’t publish lots of numbers, but it’s probably safe to assume that AWS has more customers than Azure. Watch for that to change in the future though. Azure is growing by leaps and bounds.
That’s great, but what is “the cloud”? In the past, businesses have operated their own data centers. It’s generally a building full of racks and racks of computers with blinky lights and cables all over the place. Each computer is specially built for a special purpose and maybe there’s an identical one sitting right next to it in case the first one fails. There is a huge overhead cost of the building (including cooling, electricity, maintenance, etc) as well as the need to amortize the cost of all the equipment over many years. Companies often end up running on old equipment because they’re still trying to pay it off. And oh yeah, even if you can get a datacenter running smoothly and paying for itself, if you need to be concerned about compliance, certifications and international law, be prepared to create a new data center in a variety of countries or even in specific countries. There are all kinds of laws that say, for example, you cannot move user data out of Europe if it was created there. How’s that going to work if your only datacenter is in the United States? There have been huge international changes in this area after all the Snowden revelations. And oh yeah, even if you do somehow manage to build out a global network of datacenters, some countries won’t let you do business there unless they run the datacenter themselves! You don’t even get a key to the building that you paid to construct. It quickly becomes a nightmare to manage on your own.
Cloud computing changes that whole model. Microsoft, Amazon, and others are building enormous data centers around the world. Instead of designing and buying all of your own equipment, you just rent time from the provider. In Microsoft’s case, that is the Azure service. There are many different products inside of the Azure world, but the basic premise is that you no longer need to think about the physical hardware. You just tell Microsoft what reliability and scalability you want. Do you want to store data in one part of the world but have it automatically mirrored on the other side of the world? Click a box. Do you want to change from using a single CPU machine to one with 16 CPUs? Click a box. Do you want to expand from 10 computers to 1000? Click a box. Or you can even completely ignore what kind of computer is being used and just publish your web application to Azure. Azure will then help you automatically scale based on load and automatically failover when underlying hardware has issues. It’s a big mind shift and it’s taking time for some companies to wrap their heads around it. At this point you basically have people in the old mindset who are still running their own data centers and are trying to pay off the cost over 10-20 years while their competitors are using the latest cloud technology and are just paying for what they use. Cloud computing costs a fraction of what a full, custom datacenter would cost and you’ll get eaten for lunch if your company doesn’t switch over because your competitors will be running so much faster and cheaper than you can.
I’m in the “Azure Compute” team. It’s one of the fundamental building blocks of Azure. This team provides the virtual computers directly to customers and also to all of the other Azure products. If you need computing power in Azure, it comes through this team. As an example, there is a virtual machine in Azure hosting the MySQL database that powers this website.
And finally, my title has changed from Software Engineer to Data Scientist. The “data science” term is a buzzword right now and I always ask people to define it when they use it because everyone means something different. But very generally, what it means on this team is that we take all of the telemetry data that we get from the system and we dig through it, analyze it, and run machine learning algorithms on top of it to find problems and make sure our customers are happily getting what they pay for. There’s also a lot of “data engineering” thrown in because in order to analyze petabytes of information, you have to do some serious work to get all that data moved into the right systems and curated into a format that makes sense. And once you get it all working, you have to keep it working.
This area of computer science has been around for a long time but it’s getting really hot right now. Technology has advanced to the point where storage and computing power are almost free (because of cloud computing) and everything around us is logging data. Depending on which report you read, the total amount of data on the planet doubles every year or two. And the rate at which it doubles is rapidly increasing. All of this data on it’s own is pointless so it’s the job of people like me to dig through it and extract actionable information and value from it. It’s an exciting area of the field and I’m really happy to be doing it for Microsoft Azure!