
Final Assault - Title song
Slovak group GMG released a full featured FPS for 40+ year old computer (64kb of ram, 1.79Mhz 6502)
Features:
- raycaster engine running at 25 - 30 FPS
- animated textures
- lighting system
- destroyable walls
- automap
- 3 enemy types
- final boss
game is fully playable in Altirra emulator
video: https://www.youtube.com/watch?v=lRd3MucaRoU
homepage: https://atari8.dev/final_assault
discussion: https://atariage.com/forums/topic/326709-final-assault-new-g...
the FPS is really smooth - and the floating point (?!) calculations of the raycasting engine seem to be totally "on point" ?!?!
Comment a bit of a limitation of the ray casting:
https://atariage.com/forums/topic/326709-final-assault-new-g...
simply amazing what they did. could you imagine if this came out in the 80s?!?!
Yes this would have been mind-blowing.
Ps there was a game that had a similar FPS view in those days. It was a maze game with polygon graphics but no textures. I forget the name. No shooting though!
* Mercenary: Escape from Targ (Novagen) http://mercenarysite.free.fr/mercframes_graphic.htm "open world" vector FPS/RPG, fully RAM-resident, in 48k & 64k editions.
* Alternate Reality: The City (Datasoft/Paradise Programming) FRPG with 90-degree turns, rendered walls/doors as scaled textures between the animated backdrop and foreground NPC sprites.
I wrote an FAQ as a kid for Mercenary for local BBS' - I think I discovered at least 3 victory conditions. :^)
There were a few.
I had Sultans Maze[1][2] and Articfox but plenty others existed too.
[1] https://en.m.wikipedia.org/wiki/Sultan's_Maze
I could tell something was off, but couldn't put my finger on it. Very cool
So I was curious about how the Atari 800 handled floating point calculations. As it turns out, Steve Wozniak helped develop the FP routines for the 6502.
http://archive.6502.org/publications/dr_dobbs_journal_select...
I doubt there are any floating point numbers because FP is very slow to emulate (for example, just to add two numbers you have to shift mantissas and 6502 has no fast way to do it). If I was writing a game for an 8-bit CPU, I would use fixed point numbers (for example, 8.8 numbers which use 8 bits for integral part and 8 bits for the fractional part, or maybe 10.6, or 12.4 numbers).
6502 cannot multiply or divide (division is a costly operation even on modern CPUs) so I would use adding or subtracting logarithms for this purpose. 12-bit precision logarithm table requires just 8 Kb RAM (and you can get 13-bit precision with interpolation).
The disadvantage of this approach is that every time you convert a number from linear to logarithm or vice versa you get approximately up to 0.3% error (for 12-bit logarithm). So multiplying or dividing several numbers in a row is fine, but if you have to alternate multiplication with additions you will accumulate an error. So I would look for formulas that avoid this. But for a game a little error in calculations is not noticeable.
Also one might think that the most time-consuming part for pseudo-3D game is math and calculations. I doubt that. The most of CPU cycles are usually spent in rasterisation and applying textures. Is is easy to calculate positions of 3 vertices of a triangle, but it takes a lot of time to draw it line by line, pixel by pixel, and if you want to have textures this time can be multiplied 5x-10x.
8kb table for logs would consider 16% of 64kb that you have for engine, game, intro and end sequence.
> for example, just to add two numbers you have to shift mantissas and 6502 has no fast way to do it
How much work is there to do beside shifting mantissa? (A shift is also necessary for many fixed point calculations).
IIRC, the Atari ROM routines used a different 6-byte BCD floating point format.
I can assure you that no BCD routines (nor Woz's nor Atari's) were hurt during production of this game. They are really slow. You need to use all kind of tricks and cheats when creating 3d game on 8bit machine and all needed calculations are precalculated in lookup tables.
- 80x30 resolution
- mostly 8 colors
Its impressive, but unplayable.
Having written realtime ray tracers in the classic demoscene style on e.g. 300 MHz Pentium 2 Celeron (the original one with no cache, that overclocked to 450-550 MHz), I sometimes wonder about how cool it would be to open source a modern rendering engine around the time of the first Pentium 3 with SSE, in 1999.
You could completely revolutionise computer graphics on that era of hardware, with the view to increasing vectorisation, and probably strongly steer it towards ray tracing instead of rasterisation (even skipping over the local minimum of k-D tree methods, the introduction of Surface Area Heuristic and eventually settling on modern BVH building and traversal).
I was under the impression that the theory was there but the hardware was not. Like, rasterization was a necessary evil because it gave better results more quickly (and artists needed that feedback).
Yes, absolutely right, and for games it was the only option with no fast floating point and very limited memory.
First time reading about BVH. Sounds like Kirkpatrick's hierarchy, but in arbitrary dimension.
What's the advantage over BSP/kD-trees/octrees?
And what do you mean by rasterization - we still have to deal with pixels in the end, so it has to happen somewhere? (..I'd love to play with a color vector monitor though!).
> What's the advantage over BSP/kD-trees/octrees?
With BVH, the partitioning is fundamentally over lists of objects rather than space; if you split by space, you can/will have objects on both sides of the splitting plane, leading to duplicate references.
Doing it by lists means there are no duplicate references, however the combined bounding volumes can overlap, which is to be minimised, subject to the Surface Area Heuristic cost. It winds up being something like a quicksort, although for the highest quality acceleration structures you also want to do spatial clipping... this is an extremely deep field, and several people have spent considerable part of their professional career to it, for example the amazing Intel Embree guys :)
It also happens to work out best for GPU traversal algorithms, which was investigated by software simulation quite a few years ago by the now-legendary Finnish Nvidia team, and together with improvements on parallel BVH building methods and further refinements is basically what today's RTX technology is. (As far as I'm reading from the literature over the years.)
Here's a fundamental paper to get started: https://research.nvidia.com/publication/understanding-effici... (Note that these are the same Finnish geniuses behind so many things... Umbra PVS, modern alias-free GAN methods, stochastic sampling techniques, ...)
> And what do you mean by rasterization - we still have to deal with pixels in the end, so it has to happen somewhere?
At a high level, you could think about it as where in your nested loops you put the loop over geometry (say, triangles). A basic rasterizer loops over triangles first, and the inner loop is over pixels. A basic ray tracer loops over pixels, and the inner loop is over triangles (with the BVH acting as a loop accelerator). Just swapping the order of the two loops has significant implications.
Octrees divide space into regular sized chunks. BVH divides space into chunks of varying size, but with a balanced population of bodies in each. The idealized BVH divides the population by 2 at each level.
Compared to octrees, BVHs deals well with data that’s unevenly distributed. At each level you split along the axis where you have the most extent. Finding the pivot is the interesting part. When I recently implemented a BVH from scratch, I ended up using Hoare partitioning and median-of-three and it worked really well. The resulting structure is well balanced, splitting the population of bodies roughly in half at each level, and that’s not even the state of the art, that’s just something my dumb ass coded in an afternoon.
The game uses ray casting, not ray tracing. Ray casting is when you send a ray once for every column of pixels to get a distance to a wall. Also, if the walls are only horizontal or vertical the calculations get simpler.
Also I wonder how you can achieve clock frequency like 300 MHz without a cache. Shouldn't CPU stumble on fetching every command?
> The game uses ray casting, not ray tracing. Ray casting is when you send a ray once for every column of pixels
Both ‘ray casting’ and ‘ray tracing’ are overloaded terms, the distinction isn’t as clear as you suggest. You’re talking about 2d ray casting, but 3d ray casting is common, and means to many people the same thing as ‘ray tracing’. Ray casting “is essentially the same as ray tracing for computer graphics”. https://en.wikipedia.org/wiki/Ray_casting
There’s also Whitted-style recursive ray tracing, and path tracing style ray tracing, but ray tracing in it’s most basic form means to test visibility between two points, which is what ray casting also means from time to time.
I've given up on the semantics of "ray-tracing", everyone has their own opinion. However it's fairly common for "ray-casting" to mean non-recursive, and the wiki article you link explicitly says this.
I think the biggest difference between ray-type algorithms is everything vs ray-marching, because regardless of recursion, and strategies towards lighting, texture sampling and physical realism, with ray-marching a single ray is not really a ray at all but lots of little line segments, and you don't usually bother finding explicit and intersections which gets really complex and expensive... that's the whole point, instead you find proximity or depth, which means you can render implicit surfaces like fractals.
Yes, ray casting does not ever imply recursion, it’s simply referring to casting a ray to test visibility from one point to another. Ray tracing now most commonly means exactly the same thing, and expecting anything else will often lead to confusion.
“Ray marching” is also overloaded ;) but what you’re referring to (also and originally called ‘sphere tracing’) is a new and separate idea from either ray casting or ray tracing (if you’re thinking of something other than ray casting when you say that.) Ray casting/tracing is most often done using non-iterative analytic intersection functions, where the style of ray marching you’re referring to is a distance field query, not a point to point visibility query, so ”ray marching” generally implies a different traversal algorithm, and (usually) a different representation of the geometry.
You can use any/all of these to build path tracers, but they come with different tradeoffs.
The Covington Celerons had an L1 cache but no L2 cache. The Pentium II of the era had an off-die L2 cache. So the accelerometer was a binned Pentium II without any L2 cache. A later model of the Celeron was released with a 128k L2 off-die L2 cache.
At its base clock speeds the Celerons were middling chips. But since they readily overclocked you could get them up to 450-466MHz. They wouldn't be equivalent to the same speed Pentium II (because of no L2 cache) but they punched above their weight for the price.
The Celeron-A's had on-die cache, making them pretty much a match for a regular P2 of the same clock/bus speed. There was also a mobile P2 with 256K of on-die L2, predating Coppermine P3's.
I must have misremembered, it's been a long time. I swore there were Celeron 300As that were Covington cores with the external L2 cache but it makes sense they were the Mendocino cores with the on-die L2.
I really wanted an overclocked Celeron set up but for the year or so they were hot shit my upgrade money went to storage. That was more pressing for me at the time. By the time I was due for (and could afford) a system upgrade I was able to go directly to a Pentium III.
Is there somewhere I can read more about the history of these different techniques?
Here's a video: https://www.youtube.com/watch?v=92K9wnk_4Cw
Pretty interesting but it's a little hard to make out what's going on sometimes.
Thank you. I can't help but feel the wall textures have screwed the whole thing. It might be that they've tried to just be too clever with this and a less complex solution (flat shading, gouraud shading) would have worked better.
Yeah I had the same thought. I played a few FPS on my ti-83 back in the day, it was black and white, and really it only drew the edges of the walls and outlines of people. But I could tell pretty well what was going on in comparison to this.
This atari game might also look a lot better on a crt or something.
If you reduce the window to a very small size (or look at your monitor from a few meters away) then it's actually easier to see what's going on.
I wonder how it would look on a CRT TV over RF.
It's difficult for me to tell if the player is looking at a wall or a hallway...
Maybe it’s a wallway
It's a bit unclear what's demoing visuals vs. gameplay. It seemed often that when the player reaches a corner, instead of turning the corner and continuing down the corridor the player would sometimes turn into the corner and wiggle. I wonder is there something about corners that's a bit janky or is that just a habit of this particular user that they sometimes just turn left and right randomly particularly in corners?
The gameplay is quite interesting; I noticed the upgrade on the game. I hope I have enough time to try the game out.