Web Programming vs. "Normal" Programming
Two important factors distinguish dynamic web programing, like we see in Joomla, from what we'll call "normal" programming, like a typical desktop application such as a spreadsheet program. These have to do with how the state of the program is maintained and what type of command validation is required.
Maintaining the State of the Program
The first difference is how the state of the program is maintained during the execution of the program. By state, we mean what the program knows about itself and it's environment as stored in the working memory of the computer. We can think of state as a software program's version of consciousness – it's awareness of who it is and what has been going on. Let's compare how a desktop spreadsheet program works with how Joomla works with respect to how it maintains state.
Let's illustrate this by thinking of our software programs as Aladdin with his magic lamp. Imagine that, when we issue a software command, inside the computer Aladdin is actually getting a software genie to do the work. The important difference is that the spreadsheet genie is happy to do as many wishes as we like. By contrast, the Joomla genie only grants one wish each time.
With our spreadsheet program, the first thing we do is load the software by clicking on a desktop icon. Aladdin sees this and rubs the lamp to make the genie appear. The genie comes out of the lamp and Aladdin commands "Display the spreadsheet software on the user's screen!". The genie does this and awaits Aladdin's next command.
Next, we tell the software to open the "budget" file. Aladdin transmits this command to the genie and the file is opened. This process continues until we close the program. At this time, Aladdin tells the genie "You are no longer needed. Go back into the lamp!", and the spreadsheet genie disappears back into the lamp.
Now let's see how this works with Aladdin and the Joomla genie. We start the process by loading the URL for our home page into our browser. Aladdin sees this and rubs the lamp. The Joomla genie appears and Aladdin commands "Display the URL!". The genie does his magic and our home page shows in the browser. However, since the Joomla genie only does one command at a time, he immediately disappears back into the bottle!
Now, we click on a menu item in the home page to display the "Parks" article. Aladdin has to rub the lamp again, and again the genie appears. Aladdin commands "Load the "Parks" article!". The genie loads the new page into the browser and then immediately disappears back in the lamp.
This process continues until eventually we close down the browser or navigate out of the Joomla site. At that time, Aladdin doesn't have to do anything, since the genie is already back in the lamp. This is very important. With web programming, we can't rely on the user to nicely close down the program. Fortunately, we don't need to.
The tables below show these examples step by step. (Note to editor: It might be very cool to show these as cartoon strips with illustrations.)
User |
Aladdin |
Spreadsheet Genie |
Clicks spreadsheet icon |
Rubs lamp and says "Open the spreadsheet program!" |
Comes out of the lamp, opens the spreadsheet program, and awaits next command. |
Selects the "budget-1" file to open. |
Tells genie "Open the "budget-1" file!" |
Opens the file and waits. |
Issues more commands. |
Transmits commands to genie. |
Executes each command and waits. |
Selects the exit command. |
"Close the program and return to the lamp!" |
Closes the program and disappears back into the lamp. |
Figure 15: Command Sequence with Spreadsheet Genie
User |
Aladdin |
Joomla Genie |
Enters home page URL into browser |
Rubs lamp and tells genie "Load the home page URL!" |
Comes out of the lamp, displays the URL in the browser, and disappears back into the lamp. |
Clicks on the Parks article link. |
Rubs the lamp and tells genie "Open the Parks article!". |
Comes out of the lamp, opens the file, and goes back into the lamp. |
Issues more commands. |
Rubs lamp and transmits each command to genie. |
Comes out of the lamp, executes each command, and goes back into the lamp. |
Closes the browser. |
No action needed. |
No action needed. |
Figure 16: Command Sequence with Joomla Genie
With a web program like Joomla, each time you click a link or a form submit button, you are starting what we call a new request or command cycle. The URL, any form data, and other information related to the request is packages up by the browser and sent to the web server.
With Joomla (or any other web program), nothing is remembered in the computer's working memory between request cycles. Each cycle has to start over to create all of the program objects. The Joomla genie starts from scratch each time.
Given this, how does the Joomla genie "remember" important information from one request cycle to the next? For example, he needs to know who the user is, so he can check what actions he is allowed to do. If his mind is a complete blank at the start of each cycle, how can he do this?
The answer is that we have several ways to store data across cycles. The most common one is the session variable. This is maintained on the server and is specific to the user for this session. It is stored on the server's disk and is available to Joomla. Normally, the session file is automatically deleted or disabled after a period of inactivity (for example, 15 minutes). From the session, for example, the Joomla genie can identify the current user without requiring that the user log in each time. It can also "remember" where the user was in the last command cycle, what options the user might have entered (for example, how a column was sorted in a screen).
The database is another way to save information from one command cycle to the next. It is updated as we make changes to the site, for example, by adding articles or other component items, or by changing our user profile. When we access the database in future cycles, we will see the updated information.
Using the session and the database allows Joomla to find information from previous command cycles. This allows the user to experience the different command cycles as a continuous program flow. However, it is important to keep in mind that each request cycle has to stand alone. We will see as we go along that this has important consequences for how things are done in the code.
Controlling and Checking the Commands
There is another difference between these two types of programming that has important consequences for security. With a self-contained desktop program, all of the possible commands are typically predefined in the program. Commands are typically entered via a mouse click from a list. Even if commands can be typed in directly, they are normally validated against a fixed list of possible commands and an error shows if the command is not valid.
With a web program like Joomla, we have two challenges that a desktop program normally doesn't have. First of all, we are exposing our site to the entire on-line world, which unfortunately includes people with bad intentions. We have to expect that someone will try to "hack" our web site. This could include someone trying to steal our administrative password, to deface the site (perhaps by putting in their own file for one of ours), or to try to bring the site down by altering the database. We need to practice defensive programming to guard against this.
The second challenge is that we cannot control or limit the the commands that come in as part of the request. Normally, the command will be a combination of a URL and possibly some field values from an HTML form. Most users will enter commands simply by clicking a link or a form submit button and will therefore always enter valid commands.
It is possible, however, that a user has deliberately entered a command to try to do something that they shouldn't do, for example by manually typing in a URL or altering the HTML form inside their browser. Unfortunately, there is no way for the web server to tell whether a user has clicked a link or manually entered in a URL. Likewise, there is no across-the-board way to tell whether a user has simply filled out the form and pressed submit or whether they have modified the form to submit some malicious data.
To be safe, we must always assume that commands coming in with the request could be designed to attack or hack the site and we must examine them accordingly before we execute them.
We will talk more about security and defensive programming as we go along. However, the subject is important enough to warrant an example now to illustrate the point.
Let's say we have a simple comments system where users can enter comments about articles. We let anyone submit a comment, but we only allow authorized users to approve comments. A comment is not shown on the site unless it is approved, so we protect against inappropriate comments being shown on the site.
For this example, we have two fields, the comment and whether or not it is approved. We might implement this as follows. When we display the form, we check if a user is authorized or not. If they are, we show the form like this:
Figure 17: Example Comments Form
Before we show the form, we check whether the current user is authorized to approve the comments. If they are not authorized, we simply omit the Approved field on the form and only show the Comment field. So unauthorized users will never see the Approved field and therefore won't be able to check the box.
Now, we might think that, with this design, we have prevented unauthorized users from approving comments. But we have not. Someone with knowledge about how the application works could very easily use a program like Firebug or Web Developer to edit the HTML on the page to include the missing Approved field and set its value to approved. Then, when the form is submitted, it would be approved as if the user was authorized. The web server doesn't know whether the form was altered before the submit button was pressed. It just sees the form data in the request information.
So, this design has a serious security hole. How can we fix it?
One way would be to add a check before the database is updated. Even though normally a non-authorized user would not submit the form with the approved field set to yes, we would nevertheless check this again before posting the comment to the database. In this example, before we update the database we would test that the user is authorized. If not, we would always set the Approved to "No" and then save the data. That way, even if an unauthorized user adds the approved field to the form, the invalid data won't get saved in the database, so no harm will be done.
We will discuss other examples of security issues and how to fix them as we go along.