image processing basics

**xoxos** · 04-14-2012, 04:15 PM

waldronate - i agree, it's not esoteric and there would already be significant application already if it were strongly advantaged. i can't help but consider the cases

the techniques are applicable even with a small size.. for instance, if i needed to write code to dramatically blur an image today, i'd certainly try a first order lowpass/ "leaky integrator" in each direction before running a 3x3 grid filter for several dozen passes. but that's only because of my erudition.

i'm currently rebuilding my app, perhaps i'll give it a test run to blur a height field

**waldronate** · 04-14-2012, 09:28 PM

ftp://69.31.121.43/developer/present...ing_Tricks.pdf is a good starting point. A lot of the GDC discussions are useful for application to real-time graphics systems today; I would recommend SIGGRAPH papers for more general research. http://kesen.realtimerendering.com/ is an excellent resource for locating many papers in the computer graphics field.

One of the fun things about graphics systems these days is that you're often looking at hundreds to thousands of processors and relatively inexpensive systems (way less than $5k) with a fair amount more than 10 teraflops of single-precision floating-point power. When you start looking at GPUs, it's often easier and sometimes faster to do all of your work directly in floating-point than try to keep everything in fixed-point or integer notations. Even for some CPU implementations these days you may find that floating-point will implement certain algorithms faster than integer.

Modern PC systems are hopelessly memory-bound and the reason that GPUs can be faster than CPUs is often that GPUs have more memory bandwidth (100s of GB/s) compared to PCs (10s of GB/s). Using half-floats can ease the pressure a little, but you have to live with the precision loss. But I would much rather code floating-point algorithms than fixed-point ones because I have fewer things to keep track of.

I'm lazy enough that I'd probably code a blur as an FFT on the image and blur kernel, multiply the two images, and then inverse FFT on the result image. That way I can get an arbitrary blur kernel for the same cost as a symmetric one (and a sharpen pretty much comes down to a divide instead of multiply).

**xoxos** · 04-15-2012, 03:19 PM

Originally Posted by waldronate

ftp://69.31.121.43/developer/present...ing_Tricks.pdf is a good starting point.

i am unable to rep you at present :p

this does demonstrate IIR lowpass (apparently a 'gaussian blur' technique) and resonant biquad filtering. i can't see any application for the latter beyond a costly and somewhat arbitrary video transform, but you never know. peanut butter and chocolate thing.

tackling mr. petzold (for whom i have no words) currently

**xoxos** · 04-21-2012, 01:05 PM

question..

..so i've got my basic height field editor application, with a few filters and brushes. i'm using SetPixel in WM_PAINT to redraw the window which is archaic.

is DirectDraw the choice for upping the performance of my app to a more tolerable level while maintaining compatability?

another question i wanted to ask is, being new to this community per se, are there any holes that need to be filled in terms of applications? i'm expecting WILBUR covers all the bases for 'informed' fantasy map generation..

**Redrobes** · 04-21-2012, 06:38 PM

Dont use SetPixel or even SetPixelV as they are amazingly slow. You could use Direct draw but that might be a pain and if your still going to use that with calls to set pixel APIs then it will be even slower than GDI.

The best bet for you right now with Windows only is to use the CreateDIBSection function and look into the parameter that is the ppvbits. That gets you a memory pointer to the bitmap bits so you can update them without going through the hassle of SetPixel. I.e. its much much faster. There are faster methods but not very likely faster and easier.

Once you have a bitmap modified to your users liking then during the WM_PAINT you can blit the bitmap direct to the paint DC with one call of a StretchDIBits and not faff about with per pixel calls. It will easily be more than 100x faster.

From a compatibility point of view OpenGL is more cross platform than Direct Draw. If your interested in 3D height terrain viewing then you can look at my free 3D terrain viewer.
http://www.viewing.ltd.uk/viewingdal...gonsFlight.zip
Just put a "height.bmp" file into the same folder and run it up. The height bitmap is greyscale. If you tried my instant islands then it exports that same file so put the two together. Run II then export the terrain you like then run DF.

Oh and on the blur I am with Waldronate, I would use an FFT or a DFT and then mult in a gaussian kernel and then go back using another FFT/DFT in reverse. If you were blurring just a couple of pixels blur then a convolution is easier to code and probably faster on a normal CPU. The larger the blur, the more likely the FFT will win in performance.

**xoxos** · 04-21-2012, 07:18 PM

thank you for the practical answer

you've probably saved me a lot of time.

**xoxos** · 06-06-2012, 01:50 PM

found this last night, strange that i didn't find it when i was researching the topic.. ken musgrave's dissertation on procedural landscapes -

http://www.kenmusgrave.com/Old_Stuff...ssertation.pdf

currently fishing for ways to improve the efficiency of the 3d perlin algorithm.. looks real nice but takes over a minute to process for 8 octaves (over 1.4 trillion cubic interpolations..) and atm only rotation on the y-axis is implemented (that would add a few trillion transcendental functions).

thought about using a precomputed 3d array.. eg. generate an 8-8-8 perlin array, then use tricubic interpolation to flush that out to say a 64-64-64 array (2^18 cubic interpolations) so there are 8 'curved' points between each original 'sample'. the 64-64-64 array is then read with linear interpolation (0.33 trillion lerps).

for some reason my compiler crashes with very large arrays.. i can do 4-4096-2048 (2^25) but not 2^26.

**waldronate** · 06-06-2012, 09:45 PM

If your compiler is a 32-bit compiler, it would likely not be able to generate data much more than 1GB or so in a single allocation (you're breaking between 256MB and 512MB, which might also be a 32-bit compiler limit).

One efficiency insight is that the scale of each octave varies. Thus, a small power of two array would capture the first octave at adequate resolution, one twice as large for the next octave, and so on. It doesn't necessarily save you any time in the initial generation phase, but as you zoom in you only need to compute the new octaves of data and if you zoom out then you already have much of the data precomputed. Caching can also be very important for certain gradient-based effects such as erosion ( http://dmytry.com/mojoworld/erosion_fractal/home.html has an example ).

Also, if you're looking for visual effects rather than trying to do fancy processing of the generated data, consider http://mrl.nyu.edu/~perlin/demox/Planet.html for inspiration. It uses a coarse-to-fine scheme for drawing terrain zooms. Users get to see much of what they're interested in very quickly and the software can continue to generate additional detail as the user waits longer. Caching of lower-resolution details and variable numbers of octaves may also be at work here to keep the amount of computation constant from frame to frame.

And on the Musgrave front, consider that only a short 20 years later, what he described can be done in real-time directly in your browser ( http://codeflow.org/entries/2011/nov...g-and-erosion/ ). A more optimized version would probably run a bit faster, but things are usually memory bandwidth-limited these days. Except for pathological cases like procedural generators, of course.

And again, simplex noise is somewhat more computationally efficient than the interpolant used in straight Perline noise. And a simple 2D wavelet-type "scale-and-add" operation will always be faster than a volume interpolation. If you can push your basis function texture to the graphic card, it will do the interpolation, scaling, and adding all for you. Plus rotation, too, if you're into that sort of thing.

**xoxos** · 06-07-2012, 02:18 PM

Originally Posted by waldronate

If your compiler is a 32-bit compiler, it would likely not be able to generate data much more than 1GB or so in a single allocation (you're breaking between 256MB and 512MB, which might also be a 32-bit compiler limit).

quite right, i was only looking at the indexing not the footprint

i'm going to have to think about what you've said as i may be going about this in the wrong fashion. my implementation seems very elementary to me without any optimisations between octaves. i'll paste it below in case you feel like (lol) poring through someone else's code..

simplex noise didn't sink in at all. i'd imagine a year of familiarisation with 2,3d thinking may change that. fortunately, since this is more of an exercise, the only thing i'd really gain by speed improvements is the time spent in development. ty again.

Code:

	for (y = 0; y < 2048; y++) {
		fla = (float)y / 651.580337f;	//	2047/pi
		fy = cos(fla) * hfbase;	fr = -sin(fla) * hfbase;
		for (x = 0; x < 4096; x++) {
			hf[hu][x][y] = 0;
			flo = (float)x / 651.7394919613114f + adjlong;	//	4095/tau
			fx = cos(flo) * fr;	fz = sin(flo) * fr;
			//	fx, fy, and fz mapped to sphere of radius 1 with center at origin if hfbase is removed from above

			float sum = 0.f;
			for (j = 0; j < woct; j++) {
				dx = fx * pp2[j];	dy = fy * pp2[j];	dz = fz * pp2[j];
				if (dx < 0.f) dx += 131072.f;	ix = (int)dx;	dx -= ix;	x1 = ix & 0x1f;
				if (dy < 0.f) dy += 131072.f;	iy = (int)dy;	dy -= iy;	y1 = iy & 0x1f;
				if (dz < 0.f) dz += 131072.f;	iz = (int)dz;	dz -= iz;	z1 = iz & 0x1f;

				x0 = x1 - 1;	x2 = x1 + 1;	x3 = x1 + 2;
				y0 = y1 - 1;	y2 = y1 + 1;	y3 = y1 + 2;
				z0 = z1 - 1;	z2 = z1 + 1;	z3 = z1 + 2;
				x0 &= 0x1f;	x2 &= 0x1f;	x3 &= 0x1f;
				y0 &= 0x1f;	y2 &= 0x1f;	y3 &= 0x1f;
				z0 &= 0x1f;	z2 &= 0x1f;	z3 &= 0x1f;

				p0 = tricint(dx, perlin[x0][y0][z0], perlin[x1][y0][z0], perlin[x2][y0][z0], perlin[x3][y0][z0]);
				p1 = tricint(dx, perlin[x0][y1][z0], perlin[x1][y1][z0], perlin[x2][y1][z0], perlin[x3][y1][z0]);
				p2 = tricint(dx, perlin[x0][y2][z0], perlin[x1][y2][z0], perlin[x2][y2][z0], perlin[x3][y2][z0]);
				p3 = tricint(dx, perlin[x0][y3][z0], perlin[x1][y3][z0], perlin[x2][y3][z0], perlin[x3][y3][z0]);
				pa = tricint(dy, p0, p1, p2, p3);

				p0 = tricint(dx, perlin[x0][y0][z1], perlin[x1][y0][z1], perlin[x2][y0][z1], perlin[x3][y0][z1]);
				p1 = tricint(dx, perlin[x0][y1][z1], perlin[x1][y1][z1], perlin[x2][y1][z1], perlin[x3][y1][z1]);
				p2 = tricint(dx, perlin[x0][y2][z1], perlin[x1][y2][z1], perlin[x2][y2][z1], perlin[x3][y2][z1]);
				p3 = tricint(dx, perlin[x0][y3][z1], perlin[x1][y3][z1], perlin[x2][y3][z1], perlin[x3][y3][z1]);
				pb = tricint(dy, p0, p1, p2, p3);

				p0 = tricint(dx, perlin[x0][y0][z2], perlin[x1][y0][z2], perlin[x2][y0][z2], perlin[x3][y0][z2]);
				p1 = tricint(dx, perlin[x0][y1][z2], perlin[x1][y1][z2], perlin[x2][y1][z2], perlin[x3][y1][z2]);
				p2 = tricint(dx, perlin[x0][y2][z2], perlin[x1][y2][z2], perlin[x2][y2][z2], perlin[x3][y2][z2]);
				p3 = tricint(dx, perlin[x0][y3][z2], perlin[x1][y3][z2], perlin[x2][y3][z2], perlin[x3][y3][z2]);
				pc = tricint(dy, p0, p1, p2, p3);

				p0 = tricint(dx, perlin[x0][y0][z3], perlin[x1][y0][z3], perlin[x2][y0][z3], perlin[x3][y0][z3]);
				p1 = tricint(dx, perlin[x0][y1][z3], perlin[x1][y1][z3], perlin[x2][y1][z3], perlin[x3][y1][z3]);
				p2 = tricint(dx, perlin[x0][y2][z3], perlin[x1][y2][z3], perlin[x2][y2][z3], perlin[x3][y2][z3]);
				p3 = tricint(dx, perlin[x0][y3][z3], perlin[x1][y3][z3], perlin[x2][y3][z3], perlin[x3][y3][z3]);
				pd = tricint(dy, p0, p1, p2, p3);

				o = tricint(dz, pa, pb, pc, pd);
				if (j < 2) {
					sum += o * pn2[j + 1];
				}
				else {
					if (o > 32767.5f) o = 65535.f - o;
					sum += o * pn2[j];
				}
			}
			sum *= sumdiv;
			i = (int)sum;
			if (i < 0) i = 0;
			else if (i > 65535) i = 65535;
			hf[hu][x][y] = i >> 8;
	}	}

wow, that gradient b/g makes code a trip :p

(realsiing that i ought to add latitudinal rotation as well as it only needs to be performed for each pixel, not each octave..)

**waldronate** · 06-07-2012, 10:55 PM

It's tough to say how things will work out without seeing the data structures behind that code, but it seems like it ought to be fairly straightforward to decode.

Depending on your compiler and options, the truncation to integer may be one of the slowest operations that you have. Similarly, if you're targetting x87 code rather than something like SSE2, you're leaving a lot of performance on the table. Indexing the 3D array may (or may not be) more performance intensive than indexing a 1D array with precomputed offsets, especially on P4-class processors that are lacking in barrel shifter resources.

I'm a bit confused by the number and type of interpolations that you have. Classic Perlin noise has 7 linear interpolations (4 in x, 2 in y, and 1 in z). The fractional index term is first modified by (3*t*t-2*t*t*t) to get the desired smooth behavior (the improved Perlin noise uses a better quintic function).

Spinning the sphere should definitely be something done outside of the main loop. The code that I normally use for such computations passes in the cartestian coordinate for evaluation and doesn't know anything about how the world is sampled.

Thread: image processing basics

Thread Tools

Display

Posting Permissions