In an OS you have user space and kernel space and a while ago MS didn't think that we would get to 4Gb so divided it up evenly at 2Gb each. The kernel has to have some RAM with critical lock on it cos its the low level stuff and you machine will die without it. Anyways, all the apps you run apart from heavy device drivers are all in the user space. So on the old setups you (user apps) can get at 2Gb. Then when we got near to 4Gb MS realized that the 2,2 split was not very sensible cos the kernel does not need a full 2Gb to run so split it differently. Now I have heard all sorts of numbers here but yours are about the norm - 3.5Gb or so for user and 0.5 for kernel. Your right too in that if you happen to have certain graphics cards that share system ram then it can eat into some of that and reserve more kernel and give less user. Most cards have their own video ram and need only an "aperture" as a kind of cache to transfer data from system ram into video ram. So bit more detail there but your right on the money.
With the cards tho, its been possible for some time to use them as processing engines since we had programmable shaders in them. Tho the programs were very limited indeed and simple. As time has gone on the shader complexity went up then we had a full scale shader language built into OpenGL 2. So basically your graphics cards needed to run arbitrary code. nVidia and ATI both made cpus that now run arbitrary code with interfaces that anyone can use - right now thats CUDA for nVidia and the Close to the Metal / Stream for ATI I think its called. Its down to the card not the DX version tho. Theres a newer non company specific interface done by the same people who did OpenGL called OpenCL which is the open compute language and the idea is that once nVidia and ATI both have unrestricted licensed drivers for these then we can write in OpenCL and run on either ATI or nVidia. Not sure where that is now but last I heard you still had to be a registered dev on nVidia to get access. Well when it happens then graphics intensive apps like PS will make more use of these API's and accelerate in hardware some of the processing which will speed things up a lot. I would say tho that in the meantime you can expect your CPU to be in overdrive when doing large area computes like a blur. If the image is so large your into pagefile land then as said earlier your at that 100:1 HDD ratio and your CPU does 1% work and HDD does 99%+ so you fall off the performance cliff. Its in these instances where you get 100x speed increase if you can prevent the system paging ! I.e. its the point where you must stay behind in order to work effectively. So my tip is to tile the image and stay behind that point or it gets painful.