Well, you're correct in noting that unlike a lot of machine leaning problems there's a very finite amount of training data available! A lot of what I've done is to try to extract additional input parameters from the input Earth data (which is just the elevation + land-vs-ocean map). For example, before running the classifier I compute two different measurements of the shortest distance from each land pixel to ocean (one weighting distances in a manner incorporating elevation and one that's just raw distance) and feed those in as additional inputs to the model. This has a visible impact when you look at my output as isotherms do "bend" in the middle of large landmasses similarly to how the real ones do in many cases. My results have a ways to go yet (notably they miss things like current influences and high-latitude temperature inversions), but do make clear isotherms while being a bit more organic than direct application of hard cutoffs. They're worse than I'd like but better than I'd feared (with only the one training set I'd worried about overfitting or just not having enough data for learning to begin with, since I can't exactly get more).
For an example, the result of running my classifier on this elevation map:
chord2-demNoLines.png
Was the following:
Northern hemisphere's summer ("July"):
chordTest1NS.png
Northern hemisphere's winter ("January"):
chordTest1NW.png
I also tried a similar approach for precipitation, but the results were far less satisfactory with just the set of derived input parameters (latitude, elevation, distance from ocean, and distance from ocean weighting changes in elevation as increased distance) I had already. What I'd like to add to the input parameters (for both temperature and precipitation) is some sort of "density of ocean vector" element that represents both the "net" direction to and how much ocean is close to a point, but I've been having trouble figuring out how to compute it efficiently (since at base the problem is aggregating all the ocean pixels for every land pixel; that's pretty much a non-starter unless I can determine a way to reuse enough of the work to avoid being O(x^2 * y^2)).
Thanks for linking your blog - that's some amazing work. I especially like the looks you take at demographics and trade - those are things I was interested in simulating too but I had no idea at all where to start.Originally Posted by srm038