The SSE image code requires SSE 4.1, which I think means Core 2 or newer.
If you have a CPU without SSE 4.1, please let me know if you see any problems (you shouldn't unless our SSE detection isn't quite right).
The core image code is some of the most time critical in the program, since it underpins all the user interface (Standard View and Theater View).
Here's my image benchmark from a Sandy Bridge:
17.0.108
Running 'Image' benchmark...
Image creation / destruction... 0.687 seconds
Flood filling... 0.334 seconds
Direct copying... 0.595 seconds
Small renders... 1.731 seconds
Bilinear rendering... 0.902 seconds
Bicubic rendering... 0.814 seconds
Score: 4345
17.0.109
Running 'Image' benchmark...
Image creation / destruction... 0.644 seconds
Flood filling... 0.328 seconds
Direct copying... 0.595 seconds
Small renders... 1.281 seconds
Bilinear rendering... 0.768 seconds
Bicubic rendering... 0.892 seconds
Score: 4880
17.0.110
Running 'Image' benchmark...
Image creation / destruction... 0.719 seconds
Flood filling... 0.334 seconds
Direct copying... 0.595 seconds
Small renders... 1.064 seconds
Bilinear rendering... 0.766 seconds
Bicubic rendering... 0.624 seconds
Score: 5361
17.0.111
Running 'Image' benchmark...
Image creation / destruction... 0.686 seconds
Flood filling... 0.327 seconds
Direct copying... 0.596 seconds
Small renders... 1.033 seconds
Bilinear rendering... 0.710 seconds
Bicubic rendering... 0.623 seconds
Score: 5536
27.4% faster overall
I think there's a little more performance left to find, although I'm not sure when we'll find it.
Proper support of partial alpha makes the algorithms a lot tougher, because you end up having to deal with colors and alpha independently so weird things like drawing from a transparent swatch of a funny color doesn't bleed or fade that color into the output, etc.