modified: Sunday 24 January 2021
Reply: The software of my dreams (3D to 3D capture and world reconstruction)
This is a reply to Ross Scott's video The software of my dreams where he considers ways of extracting 3D worlds from video games so they can be explored in different ways.
This really hit a personal chord. There are so many times I've wanted to see a game world from a different perspective. What's up that hill? What can I see from there?
I think it's worth splitting up the pathway to 3D world capture into a few simpler steps. Ideally the end goal would be an all-in-one tool, but getting there will be much easier if different people can work on the disparately different tasks independently. A single tool from the get go would be far too cumbersome.
Here's my best guesses at the pathways and outcomes (both good and bad):
There is a lot to get through. If you get bored of the text: scroll down to the mildly-related pretty screenshots at the end.
3D capture (3D-to-3D)
A 3D capture of a game can be thought of as similar to a video file, but you're watching actual textured meshes moving around on your screen instead of pictures of them. A recorded game demo is one (very limited) type of 3D capture; we can go further and record everything that is sent to your GPU.
In a nutshell:
- The original game is 3D
- We want to record 3D data (vertices, triangles as well as the attached materials/shaders/textures) as you play
- Tools to do this are already available and easily accessible for ordinary gamers, but the capture file size is ridiculous (over 1GB per min)
Some quick attempts at using 3D-to-3D capture tools
As an example I wanted to see how hard 3D capture is with existing tools. My goal was to record a bit of Half-life 1, in particular one part of the game that many people hate but I think is full of lots of hidden worldbuilding and charm : Xen.
Note: I'm using Wine on Linux here, so I had to add library redirects for things like opengl32.dll, d3d8.dll, d3d9.dll, ddraw.dll, etc so the recording tools could replace them. I ran the tools at the Windows level (as an actual Windows user would); it is possible to run them at the post-graphics-translation (linux) level but things get messier the further away you get from the game.
Here was my config:
Note the slightly annoying syntax for command line arguments, but that's about all I can complain about.
Here's a video recording of my capture being replayed (download link):
The performance is not great, we're well below 60FPS and often even 30. I should have turned on the game's FPS counter, todo for next time. I suspect this could all be improved, it's probably not a super priority goal for these debugging-aimed tools. My processor is also 8 years old and can't encode traditional 2D game footage without taking a notable hit to FPS anyway.
Capture file size is where this gets particularly interesting. That's about a minute of gameplay and it gets saved as a 540MB file. If I compress this using a tool like 7z: it gets down to about 120M (I suspect there are a lot of duplicate calls, meshes, etc; especially if immediate mode transfers are being intercepted). That's still far too big for any sort of reasonable gameplay capture, I don't expect a terabyte of storage to last more than half an hour with this particular game, and this will get much worse for games with much more geometry & textures.
Sound is also not recorded (it's not sent to the graphics card). Traditional recording methods need to be used for that (and opus-encoded sound is ridiculously tiny compared to 2D or 3D recordings anyway).
RenderDoc: didn't work in this case
RenderDoc proclaims to be free, supporting a lot of 3D APIs and has a fancy GUI. Screenshot stolen shamelessly from their website (spoiler: I couldn't get it to work as prettily as they show):
The game started and ran like molasses. It took a good 10 seconds for the hl.exe window to even open, then another 30 or so to get into the level. Framerate was less than 30. Odd black artefacts were across the Window.
I play for a bit and then closed the game. Renderdoc had nothing:
What? That's not right.
I tried a lot of other settings and permutations, but it turns out the proof is in the manual:
Translation: this will never work with Hl1! hl1.exe uses an OpenGL renderer from well before the OpenGL3 days.
(A big thankyou to whoever writes and organises this documentation, it's embedded directly in the program and it was really easy to find this "Gotchas" heading.)
A bit of translation for those who have not dealt with graphics APIs before: when programmers talk about "legacy" or "deprecated" openGL features they mean the "actually easy to use and extremely common features". The non-deprecated APIs have better performance but also a ridiculously higher barrier to entry, something I found out the hard way after staring at complex OpenGL tutorials for years that tell you to do stuff because its the "right" way, even though the "legacy" ways are 100x better for learning and general mucking around with. No I don't want to write shaders in my first OpenGL program. In good teaching you get students to make things where they get something working & giving them feedback early on, not make them work for hours/days before they get to see the fruits of their labor :( Different tools suit different situations!
Renderdoc probably works fine for more modern games but that doesn't suit my interests. I prefer old game worlds and I think they're a better start for 3D capture projects because:
- They're simpler in terms of data and animation.
- There are less ethical & legal issues around their capture (discussed later)
- I have fonder memories of such worlds :)
Other 3D cap tools
There are a lot more, see the table on Apitrace's page. It would be worth trying more of them just to see what the landscape is currently like and what might be best to use as a starting point.
Unfortunately it's already taken me 2 days of work on this article, so I won't be diving much deeper.
2D capture routes: skip!
A 2D capture of a game is traditional video and screenshots. You start with a 3D game and record it in 2D, hence "3D-to-2D". In summary:
- Very accessible (almost anyone can record gameplay footage)
- Potentially footage from the web can be used as sources (useful for now-unplayable (dead) games)
- A lot of information is lost in this process. If you never see something (eg rooftops, tunnels) then it doesn't exist, compression artefacts, etc.
To try and recreate a 3D world from video you need to go from 2D-to-3D. This is very hard and nasty:
- It's very computationally intensive (ie not accessible to laymen)
- The algorithms are complex & cutting edge (ie too hard for ordinary devs to contribute to)
- A lot of stuff still has to be guessed by the algos
2D-to-3D is slowly getting better, but it's not going to be achievable by ordinary joes like us in anything resembling years. Meanwhile 3D-to-3D capture is already mostly accessible and probably doesn't require any groundbreaking discoveries to turn into practical 3D world reconstruction systems.
Recomendation: skip the whole 3D-to-2D-to-3D stuff, go straight for 3D-to-3D.
Problems of 2D-to-3D in more detail
Bad accessibility to run. Ordinary computers are more than capable of doing 3D-to-3D capture in realtime whilst you play, but the computing resources required to do 2D-to-3D are enormous. As soon as you see anything related to "AI" and graphics that manages to pull distorted bunnies out of hats: assume it requires a farm of computers and lots of crushed souls as fuel.
Bad accessibility to develop. Any 3d graphics or game programmer (of which there are many!) can work with improving software that deals with vertices and polygons. Very few people in the world have the skills or resources to work on deep-learning or photogrammetry algorithms.
Details erratic in 2D video encoding. We have to deal with smudges and motion-error in video encoding, no-one has the disk space for lossless captures of everything. This makes the algorithms even harder, they have to separate the visual phantoms from the real in-game movement.
World loss in the 3D-to-2D conversion. There are always infinite 3D world solutions that meet any collection of screenshots or video, requiring your algorithm to guess which one is correct. Algorithms will get this wrong regularly, let alone the fact humans can't get it right all of the time either. Look at Escher's works for examples of this; but x1000 for every 3D detail in ways that you don't expect.
Even simple things like working out where the player is standing & looking for each frame of a video are hard problems to solve. Have a look at this naive attempt by Hugin's cpfind algorithm to work out how to arrange a bunch of screenshots into the one scene:
That's Sonic Adventure with a twist. The player stood at the same position for each of these shots, just with the view rotated. Games are a bad case for these particular algorithms because of the perfectly tiling textures.
3D-to-2D-to-3D is a workaround
We can already capture 3D-to-3D stuff, I recommend avoiding all of this craziness wherever possible. It's an infinite time sink and it's mostly an optional pathway for 3D world reconstruction.
Wait until such algorithms become mainstream and then reconsider. At the moment we're not there yet.
3D capture dedup and lossy compression
A key component to making long 3D captures practical is live compression of these captures. An implementation of this (as far as I'm aware) does not currently exist.
When you record with a 3D dumping tool you approximately get these things:
- vertice coords & lists
- triangle lists
A lot of this data gets redundantly repeated. The link between CPU and GPU is very fast and cheap to send lots of duplicate data over, it's not like the internet where you only want to download a file once.
As a result: 3D capture files are big. The exact numbers vary wildly depending on the game and the framerate (see my apitrace test further up this post where I get about 0.5G per min in Half-life 1).
I don't think any of this would require anything groundbreaking, I think any old game graphics programmers would be able to throw some options together than would be "good enough" for initial use.
ie algorithms to match vertice lists, triangle lists, textures and shaders to see if they've already been recorded.
This gets a bit more difficult when games cull parts of meshes out, so some algorithms will be needed to translate, shift and match pieces of things rather than just wholes.
Just like how we do 2D video encoding lossily: we can do the same for 3D data.
- identifying and throwing out (or heavily-lossily-encoding) regularly changing textures. Eg shadow maps and HUDs.
- identifying and throwing out of very small and rapidly changing meshes (eg fine hair, grass, etc).
- lossily compressing textures (jpeg everything! or use some newer algos that support alpha)
I suspect that rough aligning and pattern-matching of meshes will be easier than the SIFT-like algorithms we use for aligning 2D photos for panoramas. Whilst there is another dimension added: you don't have to compare everything with everything (because the sequential nature of the 3D/video capture hints as to what's local) and unique points (eg vertices at corners with angles A, B and C) are dead easy to find with almost 100% accuracy.
Juggernaut in the corner: online video platforms
Imagine watching a 3D-data stream of your favourite streamer, ie it's as if the game is actually being played in front of you on your computer.
- You can view at any screen res whilst keeping the edges of objects perfectly sharp.
- If your internet is poor then you can be served lower quality textures or vertice updates (so movement will seem a bit slurred/smoothed/lerped)
- You can change the camera angle and even view using your own custom two-camera stereoscopic (3D) config.
- Inspect competitive game videos for potential cheating & other oddities that are often hidden in 2D recordings.
- Change your perspective to be 3rd person or 1st person; or even the view you're watching from (depending on the source capture's breadth).
WebGL in browsers could be leveraged to make this as simple as going to our existing video sites.
Quality (data-rate) could still be variable: textures could be downscaled and vertex animations could be quantised (then lerp-smoothed). There are whole fields of very interesting optimisations possible here.
Why would big video hosting/streaming companies want this?
As it is: a video or steam of someone lets-playing half-life can easily be more than a gigabyte or so. That's bigger than the entire game itself. It would literally be better to send people a whole copy of the game, a recorded demo file and a recorded sound track to go with it.
Of course this would be silly for a number of reasons, starting with the developers of them game not being happy that you're handing free copies out. It's also not as easy as watching something in your web browser, as we do with existing video platforms.
At a technical level: 3D captures of games can be seen as "more easily optimised" versions of 2D captures, if the goal is 2D viewing at the end anyway. 2D video codecs have to guess motion and object boundaries on screen, but 3D captures already have that natively identified. 2D video encoding has to have a limited screen resolution, but 3D captures don't, instead limiting texture resolution and vertex position resolution (which can probably end up with prettier results per byte spent).
Getting big companies involved?
Getting big players (eg Twitch, Youtube) involved is a double-edged sword. They can and will do a lot of legwork if they see the money involved, but they're also likely to want to keep their implementations proprietary and controlled.
You would need to put a big early effort in to develop open standards and be the first to put some 3D video websites up to ensure we skip the whole closed-market-era for this technology. Once that's done it's possible that the big players will keep things like that, especially if the capture tools themselves are controlled by (eg) open-source communities. Maybe.
Asset rip-and-flip, the shadow of any better 3D tools
If people distribute 3D recordings of games and 3D recording tools become common: won't that make it easier for asset rippers to harvest things and sell them?
Technically yes, it will provide a new way for assets to be ripped, however I don't think it will be a very useful or groundbreaking compared to the existing methods. I think it will be harder.
To rip an asset from a 3D capture: you would need to separate it from the rest of the scene, tidy it up, convert the shaders to something that's more engine-generic, name & categorise it (at a minimum). I think it's inevitable that tools to try to automate this will be made (there is money in spamming asset stores with stolen junk), but this a really hard task that will inevitably result in mostly junk rips. As anyone who has worked with 3D models will attest: you need a lot of human grunt to do even basic things right (and avoid holes in your models); now imagine trying to work on lists of vertices that have been "algorithmically identified" as being part of a particular model or not.
Compare this with the current techniques of stealing assets: copying the asset files from game files directly. Vehicles, characters, objects and maps are often already bundled in easy to use forms; why make things harder by having to deal with 3D captures? You also get to keep things like bones.
Admittedly there are a few exceptions: some games don't store maps as single files, others obfuscate models, etc etc. In these situations the extra effort of going for 3D capture ripping may be worth it for asset ripper-flippers.
Different perspectives on legal and ethics
We want to explore 3D worlds in ways the original devs never imagined.
Indie game devs & modding communities
I know a lot of people that 3D capture and reconstruction would make uncomfortable.
A lot of devs want control over how people play and explore their games
This is really common. Some devs don't like other people modding their works, and this often includes mod makers themselves.
Even simple stuff like playing with effect removal or perf improvements can upset authors & publishers (I've done that!). There is not much point upsetting smaller devs and modders, especially if they're kind enough to be releasing their stuff in the first place.
Often in smaller communities the biggest tension points are other community members stealing assets, but there is also the fear of assets being taken out of the community and used/sold/etc elsewhere.
Bigger game dev companies
- We don't want people using our copyrighted works in ways we have not pre-approved of
- If we oppose 3D capture and/or 3D world reconstruction: that might give us more power over it in later years, leading to pathways to monopolising and/or monetising it.
Almost all big games these days use middleware under license from other companies. Examples include engines, shaders, sound systems and mesh-building algorithms.
Developers have to defend their work against "attacks" that "could" compromise their relationships with 3rd party middleware providers. Anything out of the ordinary will be feared, but in particular anything that dumps things like licensed shaders is a big no-no.
Penalties for not acting violently against potential threats can include:
- Having to pull their game off shelves because they've violated their 3rd party licenses (!)
- Finding much higher pricetags in their next middleware negotiations (!)
- Not being allowed to use certain middlewares in future.
Stick to a few pre-defined pathways:
- Share lossy 3D captures as an equivalent of traditional 2D gameplay videos
- Design 3D tools (both capture & world-reconstruction) to be easy for personal use
- Encourage world-reconstruction only on older games & mods
Don't do the following:
- Share 3D world captures online (with exception to abandonware worlds)
- Design 3D tools (both capture & world reconstruction) that make rip-and-ship jobs easier.
- Encourage world-reconstruction on newer games & mods
I mention "old" and "newer" in very rough terms. Would the developers be happy to find out people are showing enoguh interest in their project's worlds that they want to record, extract and explore them? Or would they see these worlds as something current/recent that they still want strict control over? There are many shades of grey here, it depends on a lot of factors.
I'd appreciate other people's thoughts on this. Please feel free to add comments below or email me, just please keep things civil. I care about everyone involved here and if you have made a nice mod or game then that includes you.
Collisions with world meshes
Collision flags on meshes are handled uniquely per game engine and not sent to the GPU (or anywhere else in a standardised format). As such there is no generic way of automatically extracting this data.
I suspect a quick and dirty hack will be enough: if the texture or material on this mesh/triangle/whatever is more than X% transparent: assume it's non-collidable. This covers leaves, grass, signs and glass; but should keep most floors and walls solid. All sorts of corner cases will pop up (catwalks! mesh fences!) but it will be a good enough starting point for enjoying the worlds.
Replacing dlls for things like DirectX and OpenGL will trigger some anti-cheat. Unfortunately it's almost universally required for 3D capture, unlike you're using modified 3D graphics drivers.
This leads on to Linux: the world where you can modify graphics drivers freely. This will sometimes be a great workaround, but of course a lot of anti-cheat blocks Linux users anyway :|
There will always be other hacked ways of doing 3D captures without triggering anti-cheat, but they are unlikely to be universal or generic.
Alternative to extracting 3D worlds: cam hacks
There are various tools on the web that try to give you freely-controlled ghost cameras in a variety of games. They are limited in a number of ways however:
- often only a small part of the world is loaded at once anyway
- compatibility is often per-game or otherwise poor
Still, these are really nice when they exist. Have a look at the FPD mod for GTA:SA I picture further down.
A few game worlds to ponder
I'm forever turning huds off, using developer modes where I can and just plain exploiting game physics to try and get myself to places with new views. I want to enjoy game worlds in ways the makers did not expect.
Invisible walls or gameplay that force you to rush through the world without engaging with it (such as in many modern minimap-driven games) are prevalent. There are a lot of valuable experiences that game worlds have to give to players that only a select few have ever been able to access. In effect games are a two tier museum, where the majority of crowds get pushed through forcefully and can't stop to enjoy things, but a small fraction know how to break in after hours to enjoy things at their own pace. I bring a balaclava, a cup of tea and a bag of chips.
Need for speed most wanted (2005)
The game never lets you go off and explore the mountains, you're stubbornly stuck on the roads:
This game would be awesome to have a picnic in, watching the cars go past and then going for a wander.
Grand Theft Auto San Andreas
GTASA already lets you explore pretty much everything, but camera-hack mods like personalised FPD can give you some really unique experiences compared to the base game:
Want to go for a drip down town? Don your sunglasses and keep an eye out the back.
Not that GTA can't give you some really nice views of its world without mods:
This one is called "No you idiots!". The more experienced police on the left clearly know a bit more about this genre and engine than the rookies on the right.
Don't mess with the preacher.
I once saw a dolphin underwater in singleplayer, but wasn't able to capture a screenshot at the time. Years later I encountered one when playing online but now I can't be sure it wasn't modded in by the server:
Game worlds can be eerie at times too. Ever driven in an unlit rail tunnel?
A lot of these interactions would never be recoverable with any sort of 3D world extraction mentioned in this post. You need the real game for it to work.
3D world extraction is really a super-nasty workaround that (I think) tells you the original game is missing features to let you explore and enjoy it. It takes so little to add invulnerable free-walking or free-flying features to game and the payoff is massive if your game has a living, breathing world.
C&C Renegade and its offshoots
Temple of Cervinae is a map where you're supposed to fight waves of deer and deer bosses through a linear sequence of events and rooms. It's really hard, have a look at the final boss:
In order to beat him I cheated myself to be invulnerable. This change lead to some interesting and unexpected interactions with the rest of the world inside this map.
Normally the deer want your blood. When they can't hurt you they instead become more like a group of friends that wants to follow you around everywhere:
They even ask you for food:
Call of duty 1
There are some amazing dystopian hellscapes in this game:
From the mod Sweet Half-life: some endings very much worth watching
I have a lot of HL2 captures I put together to print out as some photos for a friend years ago. There is far too little space here to put them all, I don't want to make things longer, but I will share these stereoscopic images for those that know how to diverge their eyes:
(Note: you may have to zoom out a bit, depending on the size of your screen. Hold down Control and slowly use the scroll-wheel on your mouse)
Next time you play a game
Look out the window:
... then try and find ways of jumping out and climbing down those trees.