Studio711.com – Ben Martens

Geek

Compass Declination

I was replaced the battery in my fancy Sunnto Vector watch which meant that I had to recalibrate some of the sensors including the compass. Calibrate a compass? We all learn in grade school that magnetic north isn’t exactly the same as true north, but it’s generally close enough that we don’t think about it. In the midwest it’s pretty close to the same, but out towards the coast the difference is significant. Here in Seattle if you follow a compass and walk north, you’ll be heading almost 17 degrees too far to the east!

Even more interestingly, the declination changes from year to year. In Seattle, it is currently decreasing by about a degree every 6 years. I strongly recommend that you click this Wikipedia link to watch an animation of how the magnetic field has changed over the last 400 years.

You probably don’t need to think about this in your day to day activities, but if you spend time outdoors, it’s good to have in the back of your mind in case you find yourself relying a compass in a survival situation.

Computer Build

A couple weeks ago, I mentioned I was about ready to pull the trigger on the new computer. That day finally arrived and I carved out some time to put it together on Wednesday.

The final parts list looks close to what I said in that post.

Core i7-860
GigaByte P55A-UD3
PNY NVidia GTS250 1GB
Corsair XMS3 8GB DDR3 1600
Corsair TX CMPSU-750TX 750W
Nippon Labs Delux 3.5″ Internal All In One Card Reader/Writer
Antec Two Hundred Case

The only thing not in that list is the DVD drive which I took out of the old computer and the hard drives which I had laying around. The OS is on a 160GB drive and the two 250GB drives are in a RAID0 config. Since the whole computer is backed up daily to the Windows Home Server, I’m not concerned about the reduced reliability of RAID0.

The whole process took me about three hours to finish. I could easily do it again in less than an hour, but I was taking my sweet time to be careful not to destroy my ~$1000 worth of parts. I had about a 45 minute panic attack when it didn’t boot. The fans would spin, a couple lights came on and then it would die. It would continue to repeat that cycle until I shut the power off.

Finally I figured out that there was a second power plug that needed to go into the motherboard. Once I plugged that in, everything worked and I was in business!

I built this computer to handle HD video so while I was building it, I filmed the whole thing. Tonight I edited the video and it was fantastic. I wish I had a before and after demo for you, but previously I’d get maybe 1 frame every second or two and every time I started or paused the video it would take a few seconds to respond. With this computer, it plays back at regular speed and is extremely responsive. Rendering was also a breeze. What used to take a couple hours to render now takes about 20 minutes.

The real test, of course, is whether or not I’d try this again. Initially I would have said no but I think the nervousness has worn off and I might be willing to give it another shot. There was a definite cost savings over buying it prebuilt from Dell and I got higher quality parts.

And Ken, I should have listened to you and gotten a modular power supply. But even with that, I still would have a mess of cables running everywhere. I need to open it back up and break out the zip ties.

I sped up the video so the whole process takes less than 5 minutes. If you go to YouTube you can watch it in HD.

DAWG Part 2

This post assumes that you’ve read at least the radix tree portion of the post from two days ago. Given that tree built from my dictionary of 100,000 words, what word would require the most nodes to be touched? One key is that the children of every node are sorted alphabetically.

So thinking logically about this, it’s probably going to be a long word. It’s probably going to contain a good number of letters that are towards the end of the alphabet. “aadvark” for example, is not going to work because right away we find the string “aa” after looking at only two nods. “zoo” on the other hand takes 26 nodes just to find the z.

I solved this by pumping all 100,000 words through the radix tree and counting how many nodes I touched for each word. The winner was “superstructures". Here are the nodes I touched at each level:

a b c d e f g h i j k l m n o p q r s
a c e f g h i k l m n o p q s t u
a b c d e f g i k l m n p
e
r
a b c e f g h i l m n o p r s
a c e i m o p t
a i o r
a e o u
c
t
u
r
a e
s

Now you have learned your random piece of trivia for the day. I can’t imagine the situation where this would be useful knowledge. Also, this answer would vary depending on the dictionary you use.

PS. I haven’t had a ton to write about so you get stuck with whatever thoughts fly through my brain. Scary, eh? I wonder what normal people think about.

Directed Acyclic Word Graph

If that title wasn’t warning enough, you should know that this is going to be a pretty geeky post. I thought I’d do a little writeup of the project I’ve been toying with for the last couple weeks and I think a handful of you will enjoy it.

It all began when I started thinking about how you would write a program to find all the words on a 5×5 grid of letters (like Boggle.) There are many apps that do this already, but how do they work? How do you take a dictionary of 100,000 words and quickly apply to it to a bazillion possible letter combinations on a 5×5 grid and find all the valid words?

Radix Tree

I started off by reading in all the words and storing them in a radix tree. There is one root node and then the children of that root node are the distinct first letters in your word list. For a full dictionary you’ll probably end up with that first level of the tree having 26 nodes, one for each letter. This pattern continues down the tree so if you grab any node in the tree, the nodes above it correspond to the prefix of a words and all possible routes down through the children correspond to valid suffixes. Take any path from top to bottom and you have a word in the dictionary.

Each node also contains a flag that we’ll call Terminal which is true if that node can end a word and false if it cannot. So in the example on the right, the “n” in “one” would have Terminal = true because “on” is a valid word.

This turns out to be a very quick way to determine if a string of letters is a valid word. But there is a lot of extra data in the tree. For example, in the diagram on the right, the ending letters in “one” and “three” are identical nodes. There’s room for improvement.

Directed Acyclic Word Graph

As I was thinking about ways to compress this tree a little bit, DaveM pointed me to DAWGs (Directed Acyclic Word Graphs.) To create the DAWG, I create an array of pointers to all the nodes in the tree sorted by their depth in the tree. Starting at the leaf nodes (depth = 0) and moving up to the root, I apply the following algorithm

For each pair of nodes at depth x, if the two nodes have the same character and the two nodes have the same terminal flag and the two nodes have identical subtrees then remove node2 from its parent and add node1 to the childlist of the parent of node2.

Done correctly in a large word list, you can throw away a half to two thirds of the nodes in you tree!

Storing the Graph

It takes a while to build and compress the graph so it makes sense to serialize the tree so that I can just read it in from disc and be ready to go. I encoded each node as a 32-bit number. The first 5 bits represent the character (enough to store the 26 letters in the alphabet), 1 bit represents the terminal flag on/off value, and the last 26 bits represent the array index of the list of the node’s children. I flatten the tree into a jagged array where each node has a slot in the array and the value of that slot is a series of 32-bit numbers which represent it’s children with the above encoding scheme. The algorithm looks like this:

Get a list of the nodes in breadth first order.

Foreach each node in the BFS list look at each of the node’s children. If the child does not have a slot in the array, create an array slot and store that index in the child node. Encode the child node into the 32-bit number and append it to the list of the parent node’s children.

So for any array index N, the list of 32 bit numbers in that array slot represent the children of that node.

Writing this out to disc is pretty straightforward. The only special steps I took were to start the file with the length of the array and then when I write each array slot, I start with the number of children at that position. This speeds things up when I read in the file because I can allocate the right sized array.

Putting It to Use

Now it’s time to put the DAWG to use. I begin by generating a random 5×5 board. I use a weighting scheme to favor the more common letters and produce boards with more words. I then start at look at each node and follow this algorithm:

If this spot hasn’t been visited and the current string plus the letter on this spot is a valid substring in the DAWG, mark this spot as visited and search again looking in all 7 directions. If the current string is a valid word in the DAWG, add it to the list of found words.

Once that’s done I sort the found word list for quicker lookup. Now the player can start the game. When they punch in a word, it’s a simple lookup in that sorted list to determine if it’s a valid word. Since I know what all the valid words are, I can tell the user they have found X% of the words and then show the full word list at the end of the game.

Performance

The speed of the DAWG generation and serialization doesn’t really matter since it only happens once, but I can plow through 100K words in roughly 30 seconds. I’m writing this game for my phone which has a 528MHz processor in it. I can read in the DAWG from disc and load the array into memory in about 9 tenths of a second. Generating the random board and finding all the words happens in only 1 tenth of a second!

The next step is to give this game a little nicer GUI and add a timer, but I always get sidetracked playing the game instead of trying to improve it. This might be as good as it gets before I move on to the next hobby project.

Ski Geek

Somehow, every activity I participate in is turned into a geek project. Skiing is no different. This season, I have been carrying around the Garmin GPS that I purchased earlier in the year. It keeps a signal inside my coat and dutifully tracks my position every few seconds.

I can load this into my National Geographic topographical map program and get an interesting view, but I wanted more. I want to know my top speed, how much time I spent in the lift lines, how fast the lift was moving, which lifts I rode the most, etc.

To that end, I’ve started writing a program to analyze the GPS data. The raw statistics are fairly simple and I was able to get a display churned out pretty quickly. Lately I’ve been stuck on trying to automatically figure out when I was on a lift. On the surface, it seems easy: you’re on a lift when you’re going up. That’s not always true. Runs have rises in them and lifts have dips in them. So then I tried to say that any time I’m heading in the same direction for X miles and Y vertical feet then I’m on a lift. Even that has problems. What happens when you get off a lift and keep skiing straight down the backside of the hill? What happens when you get a couple errant GPS points that aren’t in line with the lift? There is enough drift in the data to make it very complicated. If I can’t get the automatic solution figure out, I’m going to have the user tell me where the lifts are the first time and then I’ll save that data. I want to move on to getting either a 2D or 3D map working next. After that I’ll work out a good way to display all the statistics about the day and each individual run.

If you’re interested in seeing the code and/or helping out, it’s all available on codeplex.

The picture below shows the track from my last ski day at Crystal Mt in the National Geographic software. My software will end up looking something like this but with more data and information on the screen.

Office Decoration

When I moved into the new office, I came up with an interesting way to decorate the back wall. I used the National Geographic Topographic maps software to create a bunch of 8.5×11 PDFs. I printed everything off at Kinko's and then painstakingly placed each sheet on the wall with thumbtacks.

The whole process took forever. If I had it to do over again, I would pay extra to have it done on a plotter or some other large format. This would have saved a lot of time and also I think it would have looked better because some of the sheets didn't quite line up when I tried to place them on the wall.

That being said, I still think it looks pretty good. I get a lot of comments from people as they walk by my office. The map goes from Seattle in the lower left corner, east to Stevens Pass, and then north up to almost the north end of Whidbey Island. The smaller map on the side wall shows Mt. Rainier. As I go on various trips, I print off a small picture and stick it on the map.

I don't think I'll do it again (at least in this manner), but I'm happy with the way it came out.

Geohashing

Rachel and I have been chatting about trying to do a little geocaching instead of regular hiking. Today's XKCD comic presents an idea called geohashing. There is a formula to figure out a geohash location. The inputs to the formula are the date, that date's (or most recent) Dow opening, and the integer lat/lon of your current location. The formula includes an MD5 hash and is pictured below. There is an app already written to calculate each meeting spot and there is a wiki with more info.

I'm not going to bother to explain what all that means, but if you need more help, you can click on the links for extra learnin'.

Tivo Hack

Last night, I hacked the Tivo and lived to tell about it. The Tivo HD comes with a 160GB drive which is good for 20 hours of high def recording (184 standard def hours.) That's quite a bit, but on busy sports weekends, 20 hours fills up pretty quickly. Plus, this is supposed to be a pretty easy hack and I wanted to give it a shot.

I ordered a 500GB Western Digital hard drive from NewEgg along with an external SATA USB drive enclosure. Once I had the material, it was pretty straightforward. I opened up the Tivo and took out the hard drive (make sure you have a Torx wrenches!) I connected the 500GB drive to my system via the USB enclosure and connected the Tivo drive to a SATA connection inside my computer. I fired up WinMFS and ran mfscopy. I goofed a little and forgot to delete all the old TV shows from the Tivo so the mfscopy command took the time to copy over all those recorded shows. It took about an hour, but it would have gone much faster if I had remembered to delete the shows first.

After mfscopy finished, I made a backup of the original Tivo drive (something I should have done at the beginning.) I'll keep the Tivo drive untouched for a while to make sure the new one is ok, but if it works then I'll wipe the old Tivo drive and use it for regular storage. Theoretically I can restore that backup at any time.

The mfscopy is supposed to copy over all the Season Passes, Cable Card settings, etc. It's a bit for bit copy of the whole drive. I really didn't want to go through the Cable Card pairing again so I was hoping that it really did work. I put the 500GB drive into the Tivo and fired it up. It worked perfectly! My Tivo now says that I have 64 hours of HD space and 607 hours of standard def space. It's going to be hard to fill that up unless I leave the house for a month.

There are some firmware changes you can make to hard drives to make them run slower and be a bit quieter. I didn't do any of those and it turned out ok. I do hear the hard drive a little, but that doesn't bother me. If you're really concerned about the noise, find that tool and/or pay a little extra for one of the Seagate DB35 drives. TivoCommunity.com has all the info you need.

All in all this was a very simple "hack." After Tim saw how it worked he said, "That's not even a hack." He's right. It's really simple as long as you're willing to void your warranty and risk destroying your investment. Beyond that you only have to click a few buttons and wait. Go for it!

Web Cam Time Lapse

I've done a couple time lapse videos recently, and I think I'm becoming a bit of a time lapse fan. This latest app combines time lapse photography with the skiing web cams that I watch every day (via my Vista gadget!)

The app watches a list of web cams which you provide and saves a copy of the image at intervals which you specify. My plan is to let a bunch of these run for the next few months and then stitch them together into some type of movie. Maybe none of them will turn out well, but it was an interesting little application to write nonetheless.

The program writes out a file called usersettings.xml and that contains all the information that you provided when you set up the web cams via the Add button. If you need to edit any of the information you provided, you'll have to close the program, edit the xml, and then restart. New threads are spawned to download all the images. There are a lot of potential holes in the program, but it has been running fine on my machine for a few days.

If you've been looking for something similar, feel free to check it out. The source code and binaries are posted on Codeplex at http://www.codeplex.com/webcamtimelapse.