Saturday, May 7, 2011

Matters of Understanding Earthquakes (or the Understanding the Role of Suitable Data Representation)

I've probably moaned about this enough already, but it's not nice sitting down having lunch to suddenly get jolted by a quake rolling through. That's exactly what happened today: a nice sharp north-south "wide" jolt, like having a train run into your house, after a relatively peaceful few days.

If only we could reliably have some indication (other than 2% change of > 5.0, 40% chance of > 4.0, etc. etc.) when these might strike...


Geologists in general say that we cannot predict that damned things. Well, perhaps they are correct. We cannot predict them using "current techniques".

Why not? 

As anyone who's done any considerable work in algorithms and/or AI should be able to tell you, problems can fall into several categories, one of which being "intractable" (or unable to be solved, "using currently known techniques"). A major reason that this comes about is that we're simply going about it the wrong way, using the wrong representation of the problem.

Representation counts

As you've seen here before, the way that you structure your data can greatly influence how quickly you can get at it, but also influences things like what you can do with it and/or find out using it. By simply changing the representation of a problem, you can go from doing a "blind men in a cave trying to figure out what an elephant looks like" to, say, "identifying whether you've got an elephant" (ok, maybe this is a bit far-fetched).

One concrete example of this is if you're trying to perform combinatorial generation. If you make the right observations about the set of data, successfully identifying some pattern that might come about by modifying the way the sequence is represented (i.e. lexicographic tree), suddenly it's clear how to do this more efficiently (or at all).

Some open-ended questions...
  • Are we just representing earthquake data in the wrong way, and then because of this/on top of that using the wrong techniques to analyse this useless data? (IMO, this is a likely to be a major factor ;)
    • I wonder what this stuff looks like when visualised as quaternions or axis-angle vectors, tracing paths between these data points and maybe playing it back quickly (1-month per minute, or 1 year per minute?).
    • Could the patterns of quake activity be likened to traces of recursive algorithms (potentially running on parallel threads)? (i.e. take a look at some quicksort demos chugging along, or even some substring recognition methods out there accessing a "jump" array)
    • What about actually considering earthquakes not in terms of their absolute magnitudes, but rather in terms of their relative magnitude within a time period (i.e. "local maxima/minima") when examining potential correlations? 
  • Or does the problem go back further than this, extended right back to how we gather and analyse the data collected from instruments in the field? (Sometimes, those of us who've sat through 1000's of shakes start to wonder about this... "you call THAT a 4.0?!" There does not seem to be any reliable mapping from what a quake feels like to Richter/Moment-Magnitude values, other than longer usually means higher magnitude though the force of shaking only seems to come into it from time to time)
  • Should we be trying to perform this analysis on a global or micro region-by-region basis? Region-by-region might be more useful when trying to predict for yourself. Does global analysis result in too much of conflicting noise that prevents solutions from being found currently, or are some global-scale patterns being ignored instead?

No comments:

Post a Comment