Well, can it really do Asynchronous Shading?
Yes. Both the GCN and Maxwell architectures are capable of Asynchronous Shading via their shader engines.
GCN uses 1 graphics engine and 8 shader engines with 8-deep command queues, for a total of 64 queues.
Maxwell uses 1 graphics engine and 1 shader engine with a 32-deep command queue, for a total of 32 queues (31 usable in graphics/compute mode)
Both GCN and Maxwell architectures claim to use context switching/priority at the shader engine to support Asynchronous Shader commands.
Prove it
Well, some guy on Beyond3d's forums made a small DX12 benchmark. He wrote some simple code to fill up the graphics and compute queues to judge if GPU architecture could execute them asynchronously.
He generates 128 command queues and 128 command lists to send to the cards, and then executes 1-128 simultaneous command queues sequentially. If running increasing amounts of command queues causes a linear increase in time, this indicates the card doesn't process multiple queues simultaneously (doesn't support Async Shaders).
He then released an updated version with 2 command queues and 128 command lists, many users submitted their results.
On the Maxwell architecture, up to 31 simultaneous command lists (the limit of Maxwell in graphics/compute workload) run at nearly the exact same speed - indicating Async Shader capability. Every 32 lists added would cause increasing render times, indicating the scheduler was being overloaded.
On the GCN architecture, 128 simultaneous command lists ran roughly the same, with very minor increased speeds past 64 command lists (GCN's limit) - indicating Async Shader capability. This shows the strength of AMD's ACE architecture and their scheduler.
Interestingly enough, the GTX 960 ended up having higher compute capability in this homebrew benchmark than both the R9 390x and the Fury X - but only when it was under 31 simultaneous command lists. The 980 TI had double the compute performance of either, yet only below 31 command lists. It performed roughly equal to the Fury X at up to 128 command lists.
Furthermore, the new beta of GameworksVR has real results showing nearly halved render times in SLI, even on the old GTX 680. 980's are reportedly lag-free now.
lower is better:
Conclusion / TL;DR
Maxwell is capable of Async compute (and Async Shaders), and is actually faster when it can stay within its work order limit (1+31 queues). Though, it evens out with GCN parts toward 96-128 simultaneous command lists (3-4 work order loads). Additionally, it exposes how differently Async Shaders can perform on either architecture due to how they're compiled.
These preliminary benchmarks are NOT the end-all-be-all of GPU performance in DX12, and are interesting data points in an emerging DX12 landscape.
Quelle:
www.reddit.com