EVERYTHING TECHNICAL ABOUT METRO EXODUS UPGRADE FOR PLAYSTATION 5 AND XBOX SERIES X|S
IT’S OUT!
We’re thrilled to let you know that the Metro Exodus next gen upgrade for PlayStation 5 and Xbox Series X|S is OUT NOW. This is a perfect chance to dive into Metro Exodus since this upgrade includes generational advancements you will read about further in this blog. On top of that, it’s a free upgrade for current owners.
Moreover, we’re happy to share that Metro Exodus Complete Edition is also out now. This is the definitive physical version for Xbox One|Xbox Series X and PlayStation 5, and includes the Metro Exodus base game as well as the two story expansions – The Two Colonels and Sam’s Story. Please check with you local retailer for the availability.
To continue the discussion on the technological improvements we presented in our previous blog on Metro Exodus PC Enhanced Edition, this blog will dive into our adventures bringing the Enhanced Edition features of Metro Exodus to 9th generation consoles.
IMPLEMENTING RAY TRACING ON GEN 9 CONSOLES
When we found out that the Gen 9 consoles would support Ray Tracing, we were very excited. This level of technical capability opens up many options for us regarding how we advance our Ray Tracing technology and bring that technology to as many fans of our games as possible. Also, it has established new approaches for how we can now build our future titles. For a while, there was a real concern that the consoles, however amazing they sounded, wouldn't be able to provide high-quality Ray Tracing at a decent price point, so there were fears that there would be a need to support two completely different classes of technology in a tandem.
The big unknown was the exactly how much of a performance cost Ray Tracing on consoles would incur. If we compare, for example, the 9th generation Xbox Series X to 8th generation Xbox One X, Xbox Series X has a Graphic Processing Unit (GPU) that is capable of performing at approximately twice the speed of that of Xbox One X. The metric we measure this by is known as the teraflop (trillion floating point operations per second), essentially the number of simple arithmetic operations that the processing unit can perform in a second. For instance, Xbox Series X is capable of performing approximately 12 teraflops and Xbox One X operates at about 6 teraflops. It looks like a generational leap in performance until you realize that while you are doubling the speed of the processor you are also asking it to do its job in half the time since you are now targeting 60FPS instead of the 30FPS that we were targeting previously.
The teraflops metric is a good proxy for the GPU performance on a compute-heavy game such as Metro Exodus, but it is not everything. Different issues can still be more or less pronounced on different devices. Bottlenecks, elements of a given feature which rely heavily on one part of the hardware and significantly underutilize others, can occur in different places and for different reasons. On the surface though, the default, non-raytraced port of Metro Exodus seemed to be quite doable. It would have been a reasonably like-for-like comparison: double the power, half the time, no major concerns there. Adding Ray Tracing into the mix, however, was still questionable at the time.
That was the riskiest part of the decision, but we decided to pursue the Ray-Tracing-only route until we could test the real performance on real devkits. In any case, it was clear that we had to make everything run faster and be more scalable at the same time no matter what course we ended up taking.
PLATFORM-SPECIFIC TECHNOLOGICAL IMPROVEMENTS
DDGI
DDGI (as detailed in our previous blog post), the system that now underpins our recursive bounce functionality, was originally implemented with the primary motivation that it was a much cheaper feature computationally, and in principle it could have stood in as a console-based replacement for our precise per-pixel Ray Traced Global Illumination (RTGI), if we fell short of our performance targets. Given our concerns about the capabilities of pieces of hardware, which none of the platform holders had yet released to developers, and as such were still completely untested, it was definitely the sensible decision to err on the side of caution. Ultimately though, actual testing told a different story, and we were able to set these reservations aside: all Gen 9 platforms were quite capable of running the full implementation of per-pixel RTGI. See our previous article for more detail on our new cross-platform RTGI implementation.
Denoiser
As one of our earliest enhancements, we overhauled our denoiser. The original raytraced Metro Exodus’ denoiser was of high quality but was quite heavy on the GPU and its cost was far from being constant. While this improvement was initially a benefit to the PC version, it was nonetheless a key part of our ability to hit our console targets. That said, there was actually a number of general Ray Tracing performance optimizations developed specifically for the consoles, which later managed to find their way back to the PC.
Ray-binning
How we organized and dispatched our Ray Tracing tasks benefited from an improvement known as ray-binning. This system collects together groups of rays which are cast (travelling, if you prefer) in roughly similar directions to one another and processes them together in batches. We refer to such batches as thread groups: a group of processes operating in parallel on the GPU. Grouping rays, according to their direction like this, decrease the probability that the directions of the rays will become spatially divergent within any given thread group. Divergent Rays, as such cases are known to be terrible for performance because very literally each goes off and does their own separate thing. They can’t share any data they find about the scene with other rays and effectively perform all of their calculations independently.
The key benefit we receive from grouping our rays according to their direction then, is that it makes it more likely that many of the rays being processed in a single batch will be evaluated against the same pieces of scene geometry. Simply put, if they are pointing in the same direction then it is more likely they will hit the same objects and so can share their findings within their group. They work faster, if they cooperate. If we have a scenario where one or more rays does land upon the same object in the scene (samples the same piece of geometry), then that group will ultimately have to load in less information from memory, which can potentially reduce memory bandwidth pressure.
Bounding Volume Hierarchy (BVH)
Objects stored in our scenes’ Bounding Volume Hierarchy (BVH) now support Levels-of-Detail (LoDs) to ease scene reconstruction and to reduce the amount of geometry that rays are ultimately tested against.
How you organize your scene’s geometry is critical to real-time Ray Tracing. If you have a warehouse full of items, it’s no good just tipping them in by the truck load and then having sift through the mess every time you want to take something out. You need to have some kind of a system to organize everything. In a game, you don’t want to test against every single polygon (there are literally billions of those). So, you subdivide your scene into partitions called Bounding Volumes (literally a little rectangular box into which you put a handful of polygons). You can then check to see, if a ray even hits one of these Bounding Volumes (boxes) before you check each polygon that it contains. If you don’t hit the box, you certainly won’t hit its contents. That’s not enough though. If you can gather together a handful of polygons, you can gather together a handful of the Bounding Volumes into larger groups (put them in a bigger box) and test that group first. The higher up in this nested hierarchy you go, the more of the scene you can disregard from your calculations with a single test against one of the larger partitions: by subdividing your scene this way, you can ask, if a ray even approaches whole regions of the game world before you go in and check against the finer details. To wrap up the warehouse analogy, put an item in a box, put the box on a shelf, locate the shelf in an aisle, inside a building with an address. Then you can tell where the item you are looking for is, among the millions inside, just from the sign on the door.
This process of dividing the scene into different regions with different pieces of geometry in each exponentially reduces the time taken for any given trace, but it does mean that much of the cost of Ray Tracing is now incurred by evaluating the potential intersections between rays and the boundaries of the scene partitions (the Bounding Volumes). We call those particular tests Bounding Box tests (the process really is very analogous to boxing up bigger and bigger parts of the scene in memory in order to make them easier to organize, address, find, and sort later on).
Reducing the geometric complexity of individual objects can then potentially reduce the number of levels of the hierarchy required to optimally partition that object (divide it up so that you can sort through geometry as fast as possible), as well as potentially make the reconstruction of that hierarchy faster when objects in the scene move. This organizational structure underpins everything in Ray Tracing, so it is vital that we do all we can to make sure that it is built well, as quickly as possible, and as early in the frame as possible. Employing aggressive Level-of-Detail optimizations also helped our rasterization performance so this was a double win.
FP16
Half-precision floating point arithmetic (or FP16) refers to arithmetic operations which are performed using half of the standard number of bits. Reducing the number of bits used to store individual values reduces the precision and accuracy of those values, but also greatly reduces the physical complexity of the hardware needed to perform operations on them. Sometimes, that extra precision is absolutely essential to the thing you are trying to calculate, sometimes it is not and there are serious performance gains to be made, if you can identify what level of precision you actually need.
For example, I don’t need to know how far my commute to work is down to the nearest millimeter: kilometers will usually do just fine in such a case. In a game setting, if a large object is out of place by the width of a human hair, it doesn’t usually show any more than if it was out of place by just a few atoms. No-one looks that closely – no-one even can. A 12 teraflop GPU using full precision arithmetic is actually a 24 teraflop GPU when doing exclusively FP16 math, although that is under ideal circumstances, of course. So, we almost always see performance gains whenever and wherever we find an opportunity to reduce the precision of our arithmetic.
Temporal Reconstruction and Up-sampling
Temporal Reconstruction and Up-sampling was a component that was critical to get right as it was quite clear from the beginning that the performance (with Ray Tracing) would be much more variable than before. Temporal Reconstruction is the process of taking data from final images produced by previous frames to help you identify how features of the current image should look. Up-sampling is the process of taking a smaller image and blowing it up to a larger size while attempting to keep the same level of quality (not blurring or distorting it in any way). Our console versions have always targeted rock-solid stable, V-Sync-locked framerate, so that means that we need a way to stabilize frame render time even as the computational load varies considerably. Rendering at lower internal resolutions and upscaling provides us with a solution to this problem but reconstructing a higher resolution image from the data available in a lower resolution image requires a lot of work in order to make sure that the image quality remains as high as possible. A lot of work has gone into balancing this technique both for the final quality (which is to say the quality of the entire output image) and the speed at which this process operates.
Other noteworthy improvements
A fair amount of the work of bringing a game to console isn’t in the implementation of large features, though. It is in the fine tuning to account for the specific nature of the hardware you are working on. An example could be little tweaks to make pieces of code more friendly towards the machine’s memory configuration. The denoiser’s recurrent blur (the gradual refinement of lighting information from frame to frame) and our pre-trace pass (a fast but less accurate, screen-space equivalent of Ray Tracing that we use to capture the results of very short rays) were configured to take account of the exact cache sizes of the device at hand.
Some enhancements became available to us as optional platform features on the Gen 9 GPUs such as Draw Stream Binning Rasterization (DSBR), which provides additional means of culling hidden geometry and reduces draw-primitive pressure and memory bandwidth. Delta Color Compression (DCC), which is a lossless form of color compression, can be enabled on a per-render target basis again to alleviate memory bandwidth pressure. DCC did actually feature on Xbox One X, but in a more limited form. The updated version has a lower performance impact making it a more viable option. Variable Rate Shading (VRS) would be another feature that has only become available this generation, but which can, with relatively little effort, almost always provide some degree of a performance increase.
WORKING WITH THE HARDWARE
All of this was originally developed on Nvidia’s RTX 20-series GPUs, with fore-knowledge of the features that would be available to us on the upcoming consoles, while we were waiting for devkits to be manufactured and distributed. Once we had access to the physical units, we were able to begin porting this work over.
The first frames running on actual devkits revealed that we had been somewhat overly optimistic in our initial estimations of the expected Ray Tracing performance. That was not really an issue though as denoiser’s frontend (it’s temporal accumulation component) had already been made scalable, allowing for any resolution input while still running at full resolution itself. We halved per-pixel ray-count and it soon became apparent to us that all of the major features we were hoping to implement could then actually be feasible!
Then there were several months of continuous optimizations, async compute rebalancing, hunting the bugs as they always crawl out when new hardware and new software development kits (SDKs) are introduced. The SDKs were constantly changing as well, bringing in enhancements, some of which were specifically requested by us for the platform holders to implement. On the fun side, due to the specific way we utilize Ray Tracing on consoles, we are tied to exact formats for the acceleration structures in our BVH: formats that were changing with almost every SDK release. Constant reverse-engineering became an inherent part of the work-cycle. :)
Speaking of async compute, our typical 16ms frame has about 12ms of work on async-queue. And all of the workloads were bottleneck-balanced for Gen 9 GPUs, according to what exactly was running on graphics-queue. That made a huge difference performance-wise on the Gen 9 consoles (on the order of being 30% faster). This is evident in the PC Enhanced version as well, with a typical 18% performance gain on Nvidia hardware compared to async-off.
As for our own technological progress, the first frames rendered in a relatively simple level (the games menu screen) were running at an internal resolution of less than 2 mega-pixels at 60FPS. This gradually climbed up and up with each week of work to the current 4K at 60FPS output resolution with typical internal dynamic resolution hovering around 5 mega-pixels on our heaviest scenes (on Xbox Series X and PlayStation 5).
FUTURE CONSOLE DEVELOPMENT
We saw many strides in performance during this phase of console optimization, many of which gave us a cause to rethink our approaches to certain solutions. We’ve remained very conscious of the fact that we were aiming to have the consoles providing a consistent 60FPS experience for this title, and, with that in the back of our minds, the gradual performance improvements allowed us to also include more and more features. Despite superficial differences in the actual layout and approach of the platform-specific Application Programming Interfaces (APIs), all platforms are now running remarkably similar CPU and GPU code, and have managed to maintain a very consistent feature set.
The groundwork has been laid though and we have successfully brought a product to the 9th generation consoles complete with essentially our entire Ray Tracing feature set. This sets a baseline for this generation’s future projects. We mentioned that we had initially thought of some features as potential fallbacks solution to be maintained alongside superior and steadily evolving equivalents on PC. This wasn’t the case, but it could have been, if the consoles weren’t as good as they are. If it had been the case, then so many members of the team would have been hit by the massive increase in workload that comes with working with two separate and distinct systems in parallel. Instead, we now have a new set of standards to base our work on that are consistent across all target platforms.
As it stands then, we can say for sure that projects of this generation, across all targeted platforms, will be based off of this raytraced feature set. That is great news for the end result: it is allowing us to produce scenes with the highest level of graphical fidelity we have ever achieved and that is what the public gets to see, though these features are just as important behind the scenes.
There is a reason why we have always been so vocally critical of the idea of baking assets (pre-generating the results of things like lighting calculations) and shipping them as immutable monoliths of data in the games package files, rather than generating as much as possible on the fly: everything that you pre-calculate is something that you are stuck with. Not “stuck with” in the sense that if it is wrong it can’t be fixed (everyone loves a 50GB patch after all) but “stuck” in a much more limiting sense – that any part of your game, any object in the scene that relies on baked assets will be static and unchanging. You won’t be able to change the way it is lit so you have to be overly cautious with decisions about how the player can affect dynamic lights (you won’t be able to move it, so you disable physics on as much as possible), and the player can’t interact with it in interesting ways, so you pass that problem onto UX design.
The more you have to rely on baked assets for your scenes, the more you restrict your game design options, and the more you take the risk that your environments will feel rigid and lifeless. Perhaps, the biggest advantage that Ray Tracing brings is that it gives game developers a huge boost in the direction of worlds that are truly, fully dynamic, with no dependencies on pre-computed assets whatsoever. There are still similar examples where such problems need to be solved, but lighting was one of the biggest and most all-encompassing examples of the lot and Ray Tracing solves it.
It doesn’t just come down to what you end up shipping either. Game development is by its very nature an iterative process. You need to have some plan for where you want to take a project from the start, or course, but once you begin working on assets and developing features you always test them as part of the larger context of the game, and more often than not this leads to tweaks, refinements, and balancing. Even that might not be the end of it, other features can come along changing the experience and myriad different ways leading to yet more alterations and adjustments. For this process to work developers need an environment to work in that is intuitive, responsive, and as close a representation of the main game as possible. Our editor has always run basically the same simulation as the final game, but technological advancements seen in this generation streamlined the process significantly. Testing environments are quicker to set up with fewer dependencies on assets or on the work of other departments. Changes in visual design are visible (in their final form) immediately. The physical simulation on the whole feels more like a sandbox in which ideas can be tested and iterated upon. This makes for a more comfortable and fluent development experience, more conducive to creativity and experimentation.
The true effects of all this will take longer to be realized. The boundaries of what you can (and can’t) do in a video game have been shifted and design practices will take a while to feel them out and to fill them. But ultimately, what we see is the promise of more dynamic and engaging game worlds, with fewer limitations, which can be developed in faster and more intuitive ways. On the player side, all that translates to hopefully more content without the usual associated spiraling development cost, and to richer, more believable game experiences.
We’re looking forward to WORKING FURTHER WITH all these exciting improvements, and developing them in our future projects!
- The 4A Games Team