The 2nd title I’ve worked on for Bizarre Creations was announced today, here is the trailer… Enjoy!
A few days back I did an interview with fellow Bizarreo, Charlie Birtwistle, about the technology behind Blur. I’m pleased to announe that it is now up for your reading pleasure over at Eurogamer, see the link below. Any comments or questions are most welcome.
Eurogamer posted the following comparison of Blur on 360 and PS3:
A common thing I do in the debugger is to take a raw value of a memory address that I know points to a vector of some particular type, cast it to a pointer to that type, and then post fix it with “, n”, where n is the number of elements I want the debugger to show me in the watch window. I guess I must have had it too good with ProDG for too long as doing this just works. I tried the same in Visual Studio and was getting some very odd behavior, namely it was showing me the array, but I was unable to expand the elements of that array to view the contents of the structure encapsulated in the vector. Incidentally, Visual Studio 2008′s watch window doesn’t even come equipped with a horizontal scroll bar! Making it impossible to simply drag the width of the value column to such an extent that the value could be viewed.
After scratching my head a little I called over one of my colleagues who has a particular penchant for Visual Studio, he was equally puzzled for a minute or so, before spotting the problem… The space between the , and the value of n! (Foo*)0xbaadf00d, n doesn’t work as expected, but (Foo*)0xbaadf00d,n does. Given that sensible coders put a space between function arguments I think Visual Studio have a bug here they need to sort out.
Hopefully I can get back to ProDG land soon and make the pain go away
Since this is all over the Internets now, I think it’ll be okay to link to it . This was the unannounced game I was working on before it unfortunately got culled from THQ’s product roster in early November 2008 . I won’t go into too much detail about what the game was like, as to be truthful I’m still not 100% sure how much I can say about it, the trailer is damn cool though
Happy New Year!
So I finished work on the 22nd for the year and have been bored ever since! Sad state of affairs isn’t it, anyways, I thought I would make a post since I had the time, so anyway see how you get on with the following:
Q. How many times have you written some C++ code that takes the following general form:
iR = iA;
iR = iB;
If the answer is more than once you will probably (hopefully) be slightly interested in this post . The PowerPC ISA offers the awesome
fsel instructions which can conditionally load another general purpose register in the execution unit’s register file based on the value stored within one of the instruction’s argument registers, however in this case we are dealing with integer values and assuming that there is no such
isel instruction, how would you conditionally load a register based on a boolean value?
The C/C++ standard has the following to say about the promotion of a value of type
bool to a value of integer type:
4.7 Integral Conversions [conv.integral]
5. If the destination type is bool see 4.12. If the source value is bool, the value false is converted to zero and the value true is converted to one.
Since we can rely on true boolean values to become 1 when we promote them to
unsigned ints, multiplying a boolean that has been casted to an
unsigned int by any other integer value will of course yield the 2nd operand of the multiplication, on the other hand multiplying a
bool holding false will of course result in 0, we use this simple fact to generate ourself a bitmask that will come in handy for the 2nd part of the branch avoidance trickery!
const unsigned int iVal = static_cast<unsigned int>(bTest) * 0xffffffff;
iR = (iA & iVal) | (iB & ~iVal);
The above code works by generating a mask turning on all bits (
0xffffffff) in the event that a boolean value holds true, conversely when the boolean value is false the value
0x00000000 is generated. The 2nd line of code will, for each bit in
iVal, select the equivalent bit of
iA in the event of the bit in question being set, or the equivalent
iB in the event of the bit being unset. By making a mask that is at one extreme or the other (i.e.: all bits on or off), we are able to perform a conceptual branch and assign the complete value of
iR dependant on the value held in the
bool used to generate the mask.
Lovely stuff! Anyways, that concludes this relatively short post on branch avoidance in a trivial, but nevertheless rather common situation. I supposed I should get on with making merry for Christmas, so to any one reading this, Merry Christmas and Happy New Year!
This will be just a quick post to highlight the differences between two widely-used concepts that a number of people seem to have a little bit of confusion about . As CPUs and GPUs become more and more involved and the number of registers and execution units contained therein increases, it becomes necessary for more sophisticated methods of instruction scheduling within the GPU/CPU. Enter dual-issue and co-issue. Instructions can be dual-issued or co-issued under a very particular set of circumstances, there must exist no dependencies between the instructions that are to be dual or co-issued. That is to say the instructions must not depend on the results of one another, and, in the case of dual-issue, instructions must be suitable for issue to different ALUs, i.e.: in the case an SPU, you can’t issue an instruction that runs on an ALU in the odd pipe to the even pipe (and vice versa).
When an instruction is co-issued (in a pixel-pipeline arithmetic unit for example) two separate instructions will run on different parts of the same SIMD register at the same time. The way the register is split is up to the programmer (or more often than not, his/her best buddy “the optimising compiler”). For example, a programmer can choose to run one instruction on the first three components of the register and a separate instruction on the last (a 3:1 split) in the same clock cycle(s), or perhaps run one instruction on the first two components of the register and something else on the latter two components. Co-issue has been a feature of GeForce GPUs since the 6 series (but only 3:1 and 2:2 co-issue, this is pretty flexible if you think about it, one could easily swizzle the components of the vector to support the 1:3 case). One final thing about co-issue that should be noted is that it makes no sense to talk of co-issue in the context of non-SIMD registers, as there is nothing to split, only dual-issue is applicable.
When an instruction is dual-issued, different execution units will be used for the execution of the instructions, but these will occur at the same time. While GPUs do support this, a clearer example for the purposes of this post can be found in newer CPUs. Consider the CBE’s (Cell Broadband Engine’s) PPU (Power Processing Unit). The PPU contains three different execution units, the FXU (FiXed point Unit), the FPU (Floating Point Unit) and the VMX (Altivec). Each responsible for processing different types of instructions. As the name suggests the first unit is for fixed-point integer data, the second unit is for floating point data and the third and final is for vectorised, SIMD calculations. Since these are essentially different pieces of silicon, each with its own register file, it is possible to execute different instructions on the different units in parallel (dependencies permitting of course). As mentioned before, the same is also true of the units inside the odd and even pipes in the CBE’s SPU, and different ALUs within a GPU’s SPs. On SPU, certain (more favourable) combinations of instructions can result in a dual-issue to the SXU’s odd and even pipes iff they run on ALUs in different pipes. On GPU texture fetch instructions and arithmetic instructions can be dual-issued to different execution units within the same stream processor (under a unified shader architecture, read “vertex/pixel pipeline” for more old-school cards). It is common practise for skilful shader writers to hide arithmetic instructions behind the latency of co-independant texture fetches through the use of dual-issue.
Until next time…