Perspective Projection

WHY DIVIDE BY Z? Unraveling the geometry behind perspective projection.
by Toshi Horie

The first step toward 3D graphics in QB is to find out how to convert
3D points to screen coordinates.  Graphics people call this process 
"perspective projection." In online tutorials, we see formulas like
xs = x/z, ys = y/z without explanation.  Why divide by z?  Is it just an 
approximation?  Or is there really a geometric reason behind it?

The first thing to do to figure out the answers to these questions 
is to draw a nice diagram.  Imagine yourself looking down from the ceiling 
at your monitor and where you usually sit.  Here is my little ASCII diagram
to help you.

The 3D object, say a baseball is at point P, and it is displayed 
on the screen of the monitor at point S.  The eye is at E, and the center of 
the screen is at point C.  All points are defined so that the top left corner 
of the screen is the origin (0,0,0), and +y is down and +x is to the right 
and +z is into the monitor (up in the following diagram).  The units 
for position are in SCREEN 13 pixels, since that's the screen mode the 
sample code will be working in.  

'  [top-down view of screen, sliced at y=100]
                        Q+--------- * P(xp,100,zp)
     behind screen       |         /  (a point in 3D - assume y is 100 for now)
                         |        /
       0..  (160,100,zs) |       /              (320,100,zs)
       |=================C======S===============| <-- screen
                         |     / (xs,100,zs)
     ^+z                 |    /   where the pixel is lit
     |                   |   /
     +-->+x              |  /
                         | /
 In this figure,
   * the eye is at E (160,100,zeye).
   * the center of the screen 13 is at C (160,100,zs).
   * the point in 3D is P (xp,100,zp).
   * the point on the screen where you would plot the pixel
      corresponding to the point in 3D is S at (xs,100,zs).

 Now you have to notice that we have two similar triangles:
   - Triangle ECS and Triangle EQP are similar triangles.

 (In case you don't know what similar triangles are, they are 
 triangles with the same shape but of different sizes**.  They 
 have the property that their corresponding sides are proportional, 
 meaning they are magnified by the same amount, and thus the ratio 
 between the corresponding sides is the same.)

** There's a special case when similar triangles have the same 
size as well, but those are usually called "congruent triangles."

 This means that the ratio of the corresponding sides
 of the triangle is the same for ECS and EQP!  Which means:
          EC         CS
         ----   =  ------        .... Eq. 1
          EQ         QP

 * Notice, that
     - EC is just the distance from your eye to the screen in
       pixels, so it would be around 640 pixels in screen 13.
       (How did I get 640?  Well, my monitor has a length of 
        11 inches.  And my eye is approximately 22 inches from 
        the screen [yes, I measured it], which means that my eye 
        is twice as far from the screen as the length of it.  
        Since the 11 inches length is covered by 320 pixels, 22 
        inches should be covered by twice the pixels, or 640 pixels.
        If you are closer or farther from the screen, you have 
        to change this length accordingly.)
     - EQ is how far behind the screen the 3D point is, plus EC.
          so it is (zp-zs)+640
     - CS is (xs-160), in screen 13 pixels.
     - QP is (zp-160), again in screen 13 pixels.

   Now we want to find out what xs is, because that's the x coordinate
   of the point we want to plot with PSET.

 Substituting the values above into equation 1, we get:
 (remember the distance between the eye and center of the screen is 640)
           640           xs-160
       -----------  =  -----------   ... Eq. 2
       (zp-zs)+640       xp-160

 Now, if we assume the screen is at z=0, then zs drops out and
 things get easy.
           640           xs-160
       -----------  =  -----------   ... Eq. 3
         zp+640          xp-160

'[first figure with more numbers filled in]
                         Q+--------- * P(xp,100,zp)
       behind screen     |         /  (a point in 3D)
                         |        /               
             (160,100,0) |       /              (320,100,0)
                        :|     / (xs,100,0)
                        6|    /   pixel for point
                        4|   /
                        0|  /
                        :| /

 We want to solve this for xs, so here it goes:
   - multiplying both sides by the (xp-160), we get
      xs-160 = -----------------------        ... Eq. 3b

adding 160 to both sides of the equation, we get

      xs  = -----------------------  + 160    ... Eq. 4 (origin at top left corner of screen)

Next, we will find the formula for ys, then we can 
plot 3D points on the screen using PSET(xs,ys),colour.

   Okay, we got the formula for xs when y=100, but this same formula
   actually works for y<>100.  Why is this? Here is an intuitive 

     if i was standing on a cliff ...
     looking into oblivion
     and there's this giant orb that just floats
     say it's "30 units to the right of the center of my FOV"
     and it moves along the (vertical) y-axis
     no matter how far up or down it goes that x-coord is 
      staying the same
   ....................a more difficult explanation...................
  : The mathematical reason behind it has to do with projection again. :
  : Say y=120 (the 3D point is at xp,120,zp).  The similar triangles   :
  : formed by this point and the eye will match the one                :
  : with y=100 if you project it to the y=100 plane.                   :
  Because y does not have to be 100, the formula for xs, given in
  equation 4 can be used any time we need to project 3D points to 
  the screen.

   This gives us a formula for xs. But what about ys? It turns out
that ys can be found in almost the exact same way!

  Now you can get off the ceiling :)  Sit back in your seat, and rotate the 
monitor sideways so you can't see what's on the screen.  Before you do that,
you might want to copy the diagram below, so you can compare how the monitor
looks to the diagram.  Okay, since the screen is sideways, the +z axis points 
.to the right in the diagram, and the +y axis points down. The baseball is 
now at point P' (pronounced "pee-prime") this time.

'[side view of monitor and eye]

   in front of <---  screen  --> behind screen
         +-->into +z   ||     ::::::::::   
         |   screen    ||          (assume x = 160 
      +y v             ||           for all points)
      down             ||                       ::::::::
                       ||                             :::        
                       ||                             :::   
   Eye(160,100,zeye)   ||            behind screen    :::  
            E----------C------+ Q'                    :::
              \        ||     |                       :::
                \      ||     |                       :::
                  \    ||     |                       :::
                     \ ||     |                      :::
                       S'     |                      :::
                       || \   |                      :::
                       ||   \ |                      :::
                       ||     * P' (160,yp,zp)      :::  
                       ||                  :::::::::::
                       ||      :::::::::::::::::::    
 In this figure,
   * the eye is still at E (160,100,zeye).
   * the center of the screen 13 is still at C at (160,100,zs).
   * the new point in 3D is at P'(160,yp,zp), so everything 
     lies on the x=160 plane, so it's easier to solve.
   * the point on the screen where you would plot the pixel
      corresponding to the point in 3D is S at (xs',100,zs').

 Now you have to notice that we have two similar triangles:
   - Triangle ECS' and Triangle EQ'P' are similar.

 This means that the *ratio of the corresponding sides*
 of the triangle is the *same* for ECS' and EQ'P'!  So we have:
          EC         CS'
         ----   =  ------          ... Eq. 5
          EQ'       Q'P'

Looks just like equation 3, huh?  I told you that the x and y's
can be solved in the same way!

 The rest of the derivation looks similar too!  Just keep the numbers 
 straight, and you'll be fine.  Plugging in the lengths of the sides of 
 the triangle into equation 5, we get something that looks a lot like 
 equation 2: (remember the distance between the eye and center of 
 the screen is 640 pixels for SCREEN 13.)
           640             ys'-100
       -------------  =  -----------          ... Eq. 6
        640+(zp-zs')       yp-100

 Again, the screen is at z=0, so zs=0 and things get easier.
             640           ys'-100
         -----------  =  -----------          ... Eq. 7
            640+zp         yp-100

 We want to solve this for ys, so here it goes:
   - multiplying both sides by the (yp'-100), we get

      ys'-100 = -----------------------        ... Eq. 7b

adding 100 to both sides, we get
       ys' =   ----------------  + 100         ... Eq. 8 (origin at top left corner of screen)

   When we solve for ys, why can we forget about the x coordinate and 
   assume it is 160?  I can say, it works by analogy, but that's not 
   a proof.  Here is a physics-based explanation:

     If I was standing on the side of a flat street looking toward the 
   other side, while the cars were passing by in the x direction 
   (horizontally),  I wouldn't see the cars moving up and down, 
   would I?  [Now if this was a sloped street, cars going horizontally
   would be either taking off or crashing into the ground, like in 
   "Back to the Future," but that's another story.]

Because of the above reasoning, once again, we can generalize our 
equation to one that projects any 3D point to the screen, without 
doing any extra work! So ys = ys' if point P' is at the same 
position as point P above.
  ys =  ys' =   -------------------  + 100      ... Eq. 8a (origin at top left corner of screen)
Together, Equation 4 and equation 8a give us the complete formula for 
plotting 3D points (which have their origin on the top left 
corner of the screen, with +x axis going to the right, the +y axis pointing down,
and the +z axis pointing into the monitor) onto the screen.

Here they are again.
      xs  = -------------------  + 160      ... Eq. 4

      ys  = -------------------  + 100      ... Eq. 8a

Wait! "Top left corner of the screen?"  That means (1,-1,1) will be 
plotted off the screen!   Ok, we'll fix this, but there's another 
problem for people used to y axis pointing up.  The y-axis on our 
coordinate system points down!
  To correct this, we have to return to equation 7b.
(don't worry, it's only a small change!)

     -(ys'-100)   = -------------        ... Eq. 7b [+y axis is up in 3D point, down on screen]
Look, all we had to do was add a minus sign!  Now this makes a small change in 
the equation 8 and 8a.  Here it is:
      ys  = 100 - ---------------      ... Eq. 8a [y axis fix]

We didn't have to change equation 4 because the screen coordinate 
(abbreviated "screen coord" below) agrees with the Cartesian coordinate system 
(defined by the x, y and z axes) we used.

[to make the origin of points at center of screen]
(Note: These xp and yp variables have values different from the 
       xp and yp in Eq. 4 and 8a.)
      xs  = 160 + ----------------------      ... Eq. 4c (origin at C)
                       640+zp                       [y axis fix, origin at C]

      ys  =  100 - ---------------------     ... Eq. 8c (origin at C)
                          640+zp                    [y axis fix, origin at C]

Simplifying, we get a formula that works pretty well 
for plotting 3D points in SCREEN 13.
      xs  =  160 + -----------   ... Eq. 4c' (origin at C)
                     640+zp                [y axis fix, units in pixels]
      ys  =  100 - ----------    ... Eq. 8c' (origin at C)
                     640+zp                [y axis fix]

[How things look with the origin at C (orthogonal projection)]

           (160,100,zp) Q+--------- * P(xp,yp,zp)
                         |         /  (note: values of xp,yp,zp are different than before)
                         |        /
 (-160,ys,0)     (0,0,0) |       /              (160,ys,0)
                         |     / (xs,ys,0)
                         |    /   pixel for point
                         |   /
                         |  /
                         | /
                       E |/
      ////////////////behind eye/////////////////   

Likewise, we can move the orgin to the eye, if you want, although 
usually this *isn't* the always the best thing, because a point at the origin
will crash your 3D engine (it's equivalent to poking yourself in the eye), 
unless you write an IF statement to handle the special case!  (In fact, all 
points with z coordinates on or behind the eye shouldn't be displayed!) 
But this is actually what most 3D engines do (including OpenGL) when 
doing perspective transform.

(Note: LET xp3d = xp from Eq. 4c'
           yp3d = yp from Eq. 8c'
           zp3d = zp+640 )

      xs  = 160 + --------------   ... Eq. 4e' (origin at E, y-axis fix)
      ys  = 100 - --------------   ... Eq. 8e' (origin at E, y-axis fix)

Well, if we take a quick look at the xs = x/z, ys = y/z in the introduction, 
you'll see that 4e' and 8e' are very close.  (just take off the centering addition 
and the *640 which multiplies the x and y by the eye to screen distance).  
To really get that, you have to measure everything in special units so that 
the distance from the eye to screen is defined to be 1, and use the coordinate
system with the origin (0,0,0) at the eye and do WINDOW SCREEN (-160,100)-(160,100) 
to center the screen at (0,0,zs).  Although that is nice in theory, 
when you write a game engine, you don't want to be doing extra divide operations, 
so the forms presented in equation 4e'+8e' or 4c'+8c' works the best.  I suggest 
that you work out the math to prove to yourself that is true.

Well, we have derived several formulas for perspective projection in SCREEN 13, and we 
found out that the x/z and y/z are accurate ways to do perspective projection when we 
use the correct coordinate system and units.  We will finish this time by writing a 
simple 3D parametric function plotter.

 QBasic code (finally!)

'  3D Perspective Projection Test

'set grayscale palette
FOR i = 0 TO 255: OUT &H3C9, i \ 4: OUT &H3C9, i \ 4: OUT &H3C9, i \ 4: NEXT

'draw wavy thing around zp=100 axis
FOR t! = 0 TO 6 STEP .001
	xp = INT(100 * COS(t!))
	yp = INT(100 * SIN(8 * t!))
	zp = INT(99 * SIN(t!) + 100)

	zdenom = (zp + 640)
	'perspective projection (world space to screen space)
	IF zdenom > 0 THEN
		xs = (160 + xp * 640& \ zdenom) 'using equation 4c'.
		ys = (100 - yp * 640& \ zdenom) 'using equation 8c'.
		r = (640 \ zdenom)              'find size of point
		CIRCLE (xs, ys), r, 200 - zp    'plot it on the screen!

'draw helix around the y axis
FOR t! = 0 TO 60 STEP .001
	xp = INT(100 * COS(t!))
	yp = INT(t! + .5)
	zp = INT(100 * SIN(t!) + 100)
	xp3d = xp
	yp3d = yp
	zp3d = zp + 640

	'perspective projection (world space to screen space)
	'note how zdenom = zp3d
	IF zp3d > 0 THEN 'if point is in front of eye, then
	    'project the 3D point to the screen
		xs = (160 + xp3d * 640& \ zp3d) 'using equation 4e'.
		ys = (100 - yp3d * 640& \ zp3d) 'using equation 8e'.
		r = (640 \ zp3d)                'find size of point
		CIRCLE (xs, ys), r, 200 - zp 'plot it on the screen!

Next time, I'll talk about how to change the field of view, so 
you can get panoramic scenes or binocular zoom vision in your 
perspective code.