WHY DIVIDE BY Z? Unraveling the geometry behind perspective projection. by Toshi Horie The first step toward 3D graphics in QB is to find out how to convert 3D points to screen coordinates. Graphics people call this process "perspective projection." In online tutorials, we see formulas like xs = x/z, ys = y/z without explanation. Why divide by z? Is it just an approximation? Or is there really a geometric reason behind it? The first thing to do to figure out the answers to these questions is to draw a nice diagram. Imagine yourself looking down from the ceiling at your monitor and where you usually sit. Here is my little ASCII diagram to help you. The 3D object, say a baseball is at point P, and it is displayed on the screen of the monitor at point S. The eye is at E, and the center of the screen is at point C. All points are defined so that the top left corner of the screen is the origin (0,0,0), and +y is down and +x is to the right and +z is into the monitor (up in the following diagram). The units for position are in SCREEN 13 pixels, since that's the screen mode the sample code will be working in. ' [top-down view of screen, sliced at y=100] ' (160,100,zp) Q+--------- * P(xp,100,zp) behind screen | / (a point in 3D - assume y is 100 for now) | / 0.. (160,100,zs) | / (320,100,zs) |=================C======S===============| <-- screen | / (xs,100,zs) ^+z | / where the pixel is lit | | / +-->+x | / | / E|/ eye(160,100,zeye) In this figure, * the eye is at E (160,100,zeye). * the center of the screen 13 is at C (160,100,zs). * the point in 3D is P (xp,100,zp). * the point on the screen where you would plot the pixel corresponding to the point in 3D is S at (xs,100,zs). Now you have to notice that we have two similar triangles: - Triangle ECS and Triangle EQP are similar triangles. (In case you don't know what similar triangles are, they are triangles with the same shape but of different sizes**. They have the property that their corresponding sides are proportional, meaning they are magnified by the same amount, and thus the ratio between the corresponding sides is the same.) ** There's a special case when similar triangles have the same size as well, but those are usually called "congruent triangles." This means that the ratio of the corresponding sides of the triangle is the same for ECS and EQP! Which means: EC CS ---- = ------ .... Eq. 1 EQ QP * Notice, that - EC is just the distance from your eye to the screen in pixels, so it would be around 640 pixels in screen 13. (How did I get 640? Well, my monitor has a length of 11 inches. And my eye is approximately 22 inches from the screen [yes, I measured it], which means that my eye is twice as far from the screen as the length of it. Since the 11 inches length is covered by 320 pixels, 22 inches should be covered by twice the pixels, or 640 pixels. If you are closer or farther from the screen, you have to change this length accordingly.) - EQ is how far behind the screen the 3D point is, plus EC. so it is (zp-zs)+640 - CS is (xs-160), in screen 13 pixels. - QP is (zp-160), again in screen 13 pixels. Now we want to find out what xs is, because that's the x coordinate of the point we want to plot with PSET. Substituting the values above into equation 1, we get: (remember the distance between the eye and center of the screen is 640) 640 xs-160 ----------- = ----------- ... Eq. 2 (zp-zs)+640 xp-160 Now, if we assume the screen is at z=0, then zs drops out and things get easy. 640 xs-160 ----------- = ----------- ... Eq. 3 zp+640 xp-160 '[first figure with more numbers filled in] ' ' (160,100,zp) Q+--------- * P(xp,100,zp) behind screen | / (a point in 3D) | / (160,100,0) | / (320,100,0) |=================C======S===============| :| / (xs,100,0) 6| / pixel for point 4| / 0| / :| / E:|/ eye(160,100,zeye) We want to solve this for xs, so here it goes: - multiplying both sides by the (xp-160), we get 640*(xp-160) xs-160 = ----------------------- ... Eq. 3b 640+zp adding 160 to both sides of the equation, we get 640*(xp-160) xs = ----------------------- + 160 ... Eq. 4 (origin at top left corner of screen) 640+zp Next, we will find the formula for ys, then we can plot 3D points on the screen using PSET(xs,ys),colour. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ HOW COME WE CAN ASSUME Y=100? Okay, we got the formula for xs when y=100, but this same formula actually works for y<>100. Why is this? Here is an intuitive explanation: if i was standing on a cliff ... looking into oblivion and there's this giant orb that just floats say it's "30 units to the right of the center of my FOV" and it moves along the (vertical) y-axis no matter how far up or down it goes that x-coord is staying the same ....................a more difficult explanation................... : The mathematical reason behind it has to do with projection again. : : Say y=120 (the 3D point is at xp,120,zp). The similar triangles : : formed by this point and the eye will match the one : : with y=100 if you project it to the y=100 plane. : .................................................................... Because y does not have to be 100, the formula for xs, given in equation 4 can be used any time we need to project 3D points to the screen. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This gives us a formula for xs. But what about ys? It turns out that ys can be found in almost the exact same way! Now you can get off the ceiling :) Sit back in your seat, and rotate the monitor sideways so you can't see what's on the screen. Before you do that, you might want to copy the diagram below, so you can compare how the monitor looks to the diagram. Okay, since the screen is sideways, the +z axis points .to the right in the diagram, and the +y axis points down. The baseball is now at point P' (pronounced "pee-prime") this time. '[side view of monitor and eye] ' in front of <--- screen --> behind screen ==== ||::::::::: +-->into +z || :::::::::: | screen || (assume x = 160 +y v || for all points) down || :::::::: || ::: || ::: Eye(160,100,zeye) || behind screen ::: E----------C------+ Q' ::: \ || | ::: \ || | ::: \ || | ::: \ || | ::: S' | ::: || \ | ::: || \ | ::: || * P' (160,yp,zp) ::: || ::::::::::: || ::::::::::::::::::: ||:::::::::: ==== In this figure, * the eye is still at E (160,100,zeye). * the center of the screen 13 is still at C at (160,100,zs). * the new point in 3D is at P'(160,yp,zp), so everything lies on the x=160 plane, so it's easier to solve. * the point on the screen where you would plot the pixel corresponding to the point in 3D is S at (xs',100,zs'). Now you have to notice that we have two similar triangles: - Triangle ECS' and Triangle EQ'P' are similar. This means that the *ratio of the corresponding sides* of the triangle is the *same* for ECS' and EQ'P'! So we have: EC CS' ---- = ------ ... Eq. 5 EQ' Q'P' Looks just like equation 3, huh? I told you that the x and y's can be solved in the same way! The rest of the derivation looks similar too! Just keep the numbers straight, and you'll be fine. Plugging in the lengths of the sides of the triangle into equation 5, we get something that looks a lot like equation 2: (remember the distance between the eye and center of the screen is 640 pixels for SCREEN 13.) 640 ys'-100 ------------- = ----------- ... Eq. 6 640+(zp-zs') yp-100 Again, the screen is at z=0, so zs=0 and things get easier. 640 ys'-100 ----------- = ----------- ... Eq. 7 640+zp yp-100 We want to solve this for ys, so here it goes: - multiplying both sides by the (yp'-100), we get 640*(yp-100) ys'-100 = ----------------------- ... Eq. 7b 640+zp adding 100 to both sides, we get 640*(yp-100) ys' = ---------------- + 100 ... Eq. 8 (origin at top left corner of screen) 640+zp ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ HOW COME WE CAN ASSUME X=160 ? When we solve for ys, why can we forget about the x coordinate and assume it is 160? I can say, it works by analogy, but that's not a proof. Here is a physics-based explanation: If I was standing on the side of a flat street looking toward the other side, while the cars were passing by in the x direction (horizontally), I wouldn't see the cars moving up and down, would I? [Now if this was a sloped street, cars going horizontally would be either taking off or crashing into the ground, like in "Back to the Future," but that's another story.] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Because of the above reasoning, once again, we can generalize our equation to one that projects any 3D point to the screen, without doing any extra work! So ys = ys' if point P' is at the same position as point P above. 640*(yp-100) ys = ys' = ------------------- + 100 ... Eq. 8a (origin at top left corner of screen) 640+zp Together, Equation 4 and equation 8a give us the complete formula for plotting 3D points (which have their origin on the top left corner of the screen, with +x axis going to the right, the +y axis pointing down, and the +z axis pointing into the monitor) onto the screen. Here they are again. 640*(xp-160) xs = ------------------- + 160 ... Eq. 4 640+zp 640*(yp-100) ys = ------------------- + 100 ... Eq. 8a 640+zp Wait! "Top left corner of the screen?" That means (1,-1,1) will be plotted off the screen! Ok, we'll fix this, but there's another problem for people used to y axis pointing up. The y-axis on our coordinate system points down! To correct this, we have to return to equation 7b. (don't worry, it's only a small change!) 640*(yp-100) -(ys'-100) = ------------- ... Eq. 7b [+y axis is up in 3D point, down on screen] 640+zp Look, all we had to do was add a minus sign! Now this makes a small change in the equation 8 and 8a. Here it is: 640*(yp-100) ys = 100 - --------------- ... Eq. 8a [y axis fix] 640+zp We didn't have to change equation 4 because the screen coordinate (abbreviated "screen coord" below) agrees with the Cartesian coordinate system (defined by the x, y and z axes) we used. [to make the origin of points at center of screen] (Note: These xp and yp variables have values different from the xp and yp in Eq. 4 and 8a.) 640*(xp+160-160) xs = 160 + ---------------------- ... Eq. 4c (origin at C) 640+zp [y axis fix, origin at C] 640*(yp-100+100) ys = 100 - --------------------- ... Eq. 8c (origin at C) 640+zp [y axis fix, origin at C] ******************************************************************* Simplifying, we get a formula that works pretty well for plotting 3D points in SCREEN 13. 640*xp xs = 160 + ----------- ... Eq. 4c' (origin at C) 640+zp [y axis fix, units in pixels] 640*yp ys = 100 - ---------- ... Eq. 8c' (origin at C) 640+zp [y axis fix] ******************************************************************* [How things look with the origin at C (orthogonal projection)] (160,100,zp) Q+--------- * P(xp,yp,zp) | / (note: values of xp,yp,zp are different than before) | / (-160,ys,0) (0,0,0) | / (160,ys,0) |=================C======S===============| | / (xs,ys,0) | / pixel for point | / | / | / E |/ eye(160,100,-640) ////////////////behind eye///////////////// /////////////////////////////////////////// Likewise, we can move the orgin to the eye, if you want, although usually this *isn't* the always the best thing, because a point at the origin will crash your 3D engine (it's equivalent to poking yourself in the eye), unless you write an IF statement to handle the special case! (In fact, all points with z coordinates on or behind the eye shouldn't be displayed!) But this is actually what most 3D engines do (including OpenGL) when doing perspective transform. (Note: LET xp3d = xp from Eq. 4c' yp3d = yp from Eq. 8c' zp3d = zp+640 ) 640*xp3d xs = 160 + -------------- ... Eq. 4e' (origin at E, y-axis fix) zp3d 640*yp3d ys = 100 - -------------- ... Eq. 8e' (origin at E, y-axis fix) zp3d Well, if we take a quick look at the xs = x/z, ys = y/z in the introduction, you'll see that 4e' and 8e' are very close. (just take off the centering addition and the *640 which multiplies the x and y by the eye to screen distance). To really get that, you have to measure everything in special units so that the distance from the eye to screen is defined to be 1, and use the coordinate system with the origin (0,0,0) at the eye and do WINDOW SCREEN (-160,100)-(160,100) to center the screen at (0,0,zs). Although that is nice in theory, when you write a game engine, you don't want to be doing extra divide operations, so the forms presented in equation 4e'+8e' or 4c'+8c' works the best. I suggest that you work out the math to prove to yourself that is true. Well, we have derived several formulas for perspective projection in SCREEN 13, and we found out that the x/z and y/z are accurate ways to do perspective projection when we use the correct coordinate system and units. We will finish this time by writing a simple 3D parametric function plotter. QBasic code (finally!) DEFINT A-Z SCREEN 13: CLS '===================================== ' 3D Perspective Projection Test '===================================== 'set grayscale palette FOR i = 0 TO 255: OUT &H3C9, i \ 4: OUT &H3C9, i \ 4: OUT &H3C9, i \ 4: NEXT 'draw wavy thing around zp=100 axis FOR t! = 0 TO 6 STEP .001 xp = INT(100 * COS(t!)) yp = INT(100 * SIN(8 * t!)) zp = INT(99 * SIN(t!) + 100) zdenom = (zp + 640) 'perspective projection (world space to screen space) IF zdenom > 0 THEN xs = (160 + xp * 640& \ zdenom) 'using equation 4c'. ys = (100 - yp * 640& \ zdenom) 'using equation 8c'. r = (640 \ zdenom) 'find size of point CIRCLE (xs, ys), r, 200 - zp 'plot it on the screen! END IF NEXT t! 'draw helix around the y axis FOR t! = 0 TO 60 STEP .001 xp = INT(100 * COS(t!)) yp = INT(t! + .5) zp = INT(100 * SIN(t!) + 100) xp3d = xp yp3d = yp zp3d = zp + 640 'perspective projection (world space to screen space) 'note how zdenom = zp3d IF zp3d > 0 THEN 'if point is in front of eye, then 'project the 3D point to the screen xs = (160 + xp3d * 640& \ zp3d) 'using equation 4e'. ys = (100 - yp3d * 640& \ zp3d) 'using equation 8e'. r = (640 \ zp3d) 'find size of point CIRCLE (xs, ys), r, 200 - zp 'plot it on the screen! END IF NEXT t! Next time, I'll talk about how to change the field of view, so you can get panoramic scenes or binocular zoom vision in your perspective code. |