Friday

Enterprise Application Development Using PHP and Oracle

How McGill University's Department of Development and Alumni Relations supports up to 160,000 users over the web using PHP and Oracle

At McGill University's (Montreal, Quebec, Canada) Department of Development and Alumni Relations, we are in the process of moving extensive, multifaceted services for 200 to 300 paid and volunteer staff and up to 160,000 alumni and donors to the Web using an industrial-str
ength PHP application and Oracle9i. It is expected that the program will accommodate tens of thousands of users per hour following major e-broadcasts. We are doing this using Oracle's powerful row-level security features and PHP's high performance and quick development platform. This article will address the following:

* Background
* Why we chose PHP, Oracle9i, Apache, and Linux
* Structure of the application
* Security
* Synchronization
* Lessons learned and advice to developers.

Background

Our organization faces what many do:

* A central database application that is
o sprawling and cumbersome
o mission-critical and irreplaceable
o difficult or nearly impossible to change
o dependent on the manufacturer's version upgrade for modification
o not accessible to the internet for concerns of safety and cost
* A flood of semi-independent satellite databases and Web sites that are
o inadequately synchronized with the central database

o too numerous for proper security and upgrade maintenance
o providing end users with too many IDs and passwords
o not following established organizational business practices
* A growing demand for
o organizational efficiency and productivity through Web-based IT tools
o ease of use
o synchronization of data across functions
o scalability
o flexibility
o expandability of functionality
o enforcement of IT and general business practices

So, what are we doing about these challenges?

We've launched the first phase of a secure, Web-based modular database application. It is written in PHP and Oracle9i and promises to better serve users as it replaces satellite applications and Web sites. It synchronizes common data and enforces best business practices, both general and for data.

The application is called the Companion, as it augments the functionality of the revered central database, reports central data accurately, and sends back fresh data as it becomes available.

It comprises more than 100 tables divided into a half-dozen modules. The longest table has about one million rows. Several tables have 20 to 30 columns. Full use is made of Oracle's Fine-Grained Access Control (FGAC) and Virtual Private Database (VPD) for row-level security.

The Companion application consists of three domains, and it can accommodate more domains, all accessing the same data within the same PHP/Oracle application securely. These domains comprise the following:

1. Staff intranet
An intranet for 200 to 300 staff that contains all Web sites, online documents, and directories that one finds in many staff intranets, along with a suite of modules for doing all the things that satellite applications once did. The definition of staff has been expanded to include volunteers (we are a nonprofit) and selected employees from outside our unit. The staff intranet includes tools to create pages, articles, and events found in the other two domains.

2. "Constituents" intranet
A password-protected online community for up to 160,000 of the constituents we serve—in this case, our graduates

3. Public domain
An information site that links to publicly available sub-sites and news articles.

Why We Chose PHP, Oracle, Apache, and Linux

Apache and Linux. Both have reputations as being robust, low-cost, and secure (with proper configuration). Besides its impressive credentials as a database server OS, we found that Debian version of Linux (from the Debian Project at http://www.debian.org/) is particularly well-suited as a Web server because of its exceptionally high level of support among the open source community, the quality of its integrated software, and its reputation for stability.

PHP. We chose to use PHP over Active Server Pages, ColdFusion, and JSP as our Web server-side scripting environment because of its speed, cost benefits, extensive libraries, and customizability compared to other Web technologies. Using PHP also frees us from being tied to a particular vendor's set of development tools. Support is available on the Web through a network of Web sites and programmer forums that provide helpful tips on how to tackle advanced problems. Several Web sites also
offer a large variety of working code that can be used to develop components of a larger application. Most programmers easily understand PHP code because it uses syntax similar to that used in C. We discussed using Java for developing server objects that we had hoped would run more efficiently than scripted code. However, we have not yet found the need to integrate with Java, because PHP has proven to be flexible and fast enough to fulfill our requirements. Listing 1 shows an example of querying Oracle through PHP.

The downside of using PHP became apparent during more-complex object-oriented programming. Because it was originally built as a scripting language, the object-oriented features in PHP version 4 are minimal. This issue has been addressed in PHP version 5, where advanced object-oriented features have been introduced.

Oracle9i. Synchronization of the Companion and central databases without any third-party tools was one of our top requirements. Exchanging data between two Oracle9i systems would prove to be much easier than developing other means of converting and transferring data. The benefits of using Oracle9i outweighed the costs in our case, because we would be able to leverage preexisting PL/SQL code, thus saving development time. Because Oracle has also been ported to Linux, we now have more choices of database server operating systems on PC architecture. Although the central database runs on Oracle8i, we chose to develop using 9i to take advantage of its more advanced VPD features such as FGAC and the Secure Application Context, the only platform on the market with such capabilities.

Structure of the Companion Application

As with many software projects, the objective of our development team was to create a flexible and scalable application composed of a number of functionally independent modules based on a framework of reusable security and data access components. Here's how we achieved it.

N-tier architecture. In order to ensure the application's portability and ability to scale up if needed, the overall architecture would need to be layered—in other words, split into two or more tiers that would reduce the risk of rewriting the whole application if some of the underlying technologies become obsolete or inadequate to our needs.


After much thought, we made the decision in favor of a custom n-tier architecture that took into consideration application requirements and organization specifics, as opposed to a more conventional two- or three-tier approach. (See Figure 1.)

figure 1

Figure 1: Layered structure of the Companion

The presentation GUI tier is what end users actually see and manipulate with their local browser. This tier consists of HTML pages and forms rendered by underlying layers and includes client-side JavaScript and CSS (Cascading Style Sheets) for enhanced presentation and navigation. The main purpose of this tier is to collect user input and to present the output.

The presentation logic tier resides on the server and consists exclusively of PHP scripts that are separated logically by functionality. That includes presentation templates and layouts, validation routines, and scripts specific to particular modules. The main purpose of this tier is to validate and process user inputs and to prepare output based on data received from both the users and the lower tiers.

The business logic tier is where most business rules are implemented (for security's sake, certain critical rules are enforced exclusively on the database level and not exposed to the application level). It provides for a collection of PHP classes that encapsulate the rules specific to various parts of the system. Additionally, this tier contains the security framework that governs user access on the application level by providing mechanisms for user authentication, session management, and request authorization. In particular, the framework exposes a number of public methods that allow higher tiers to customize user interface based on security privileges. Finally, the tier contains reusable common classes employed across the application for data encryption, compression, datatype conversion, and so on.

The data abstraction logic tier is represented by an array of generic methods to interface the application to the actual data stored in the database. In other words, it provides an additional level of abstraction between the data tier and the rest of the system, which makes the application virtually independent of the database engine. After extensive testing we picked the ADODB library (http://php.weblogs.com/), another open source tool, to provide this functionality.

The data tier, a.k.a. Oracle, is undoubtedly the most important and complex part of the system. Its security mechanisms are explained in the security section below.

The physical layout of the Companion application is depicted in Figure 2:

figure 2

Figure 2: Physical layout of the Companion

Modules. Each module is a synthesis of data structures (tables and views) and PL/SQL packages. Both structures and packages integrate with the GUI application through the PHP classes in the business logic tier to deliver a set of thematically unified functions. Not only does modular construction make logical and coherent the work of developing and organizing functionality, it does the same for security.

Modules are logically separate from each other. Each has a unique set of security policies, its own folder on the Web server full of its PHP application files, and its own GUI sub-site and table set. Every table name is prefixed by two letters representing the module that the table belongs to. Files, policies, procedures, and other stored sets of code are named according to a convention that identifies them as part of a particular module.

Modules in one sense are separate from each other, but each is embedded in the larger application. Modules share the same navigation elements and common data, and they all reuse classes from the business logic tier.

Table HR_Employee

c1_id c1_emp_category web_emp_photo c2_emp_email
------- ---------------- -------------- ------------
M290171 central staff (blob) james.milton@mcgill.ca
C179022 central staff (blob) doris.seagal@mcgill.ca
M109022 casual (blob) luke.grande@mcgill.ca
C390101 work-study (blob) patrick.roy@mcgill.ca
M203400 faculty based staff (blob) mcmaster.philip@mcgill.ca

Prefix code for Tables and Fields:

HR - Human Resources module (each module has a 2 letter prefix for each of its tables)
C1 - Central Database 1*
C2 - Central Database 2*
WEB - Companion Database Application*

* Fields are prefixed by their source database.
Since this application is a companion database,
it synthesizes and uses data from a variety of
sources, most especially a primary central DB.


Although most modules follow similar security structures and philosophies, it was at the beginning of the project that each of our modules would require security arrangements specific to its needs. An example of common security philosophy is that of the gurus. To each module is assigned roughly three gurus. Gurus, mostly qualified nontechnical users, have full administrative privileges within their module and can assign rights to other administrators of lesser security rank. An example of differing security philosophies is that of the confidential information in the HR module. When employees access their confidential information, their row-level security hinges on their Entity ID. In most other modules, however, their row-level security hinges on their job's Position ID.

Security

Given the sensitivity of our data, restricting access at the application level through traditional cookie-based session mechanisms was not enough to guarantee adequate security. On a database with complex rules for access and expectations for heavy traffic, the best way to achieve performance while managing row-level security is by pushing most security functions — those which enforce the specific rules for data restriction — to the database level.

Securing data access at the database engine level through VPD puts access control at the lowest level and ensures that a thief's query tools, which can bypass application security mechanisms, cannot reach the data. Additionally, it provides the organization with lower cost of ownership: build security once, in the data server, instead of building security into every application tool that accesses the data. Finally, it centralizes data access management and control.

The Companion utilizes Oracle's traditional Basic Role Security and User Account Security. More important, it uses two powerful VPD tools: Secure Application Context and FGAC. (For a great explanation of VPD, see otn.oracle.com/deploy/security/oracle9ir2/pdf/VPD9ir2twp.pdf). Using these four security concepts as our framework, let's examine how our business model fuses with Oracle.

Basic User Account Security. Security and database architecture is structured around a master list of names updated daily from the central database. These names are referred to as entities, not users.

The Companion contains two user groups, staff and graduates. A user group is treated by Oracle's basic security as a single account. That means, for example, that 30,000 simultaneous users share just two accounts. Other users (including some entities and all nonentities) may not share either account but may still access unrestricted public data. (Note: In this article, user refers to a person who accesses the system, while account refers to an Oracle basic user account.) As shown in Figure 3, users are a subset of entities and members of the public.

figure 3

Figure 3: Users: a subset consisting of some DB entities and some members of the public.

Security in Action

Here's an example of several uses of Oracle security: First the script (see Listing 2) creates users and tables in a sample schema called web_schema. Test data is then inserted into tables. Database context and procedure to set the context are then created. Next, a function using database context and returning SQL predicate for command DELETE is written in the security manager schema MANAGER_SCHEMA. Policy is created and attached to the table web_page. At last the results are tested. Note that none of the test users has access to the table web_editor, where Web page authors and editors are kept.

This policy restricts access to DELETE rows in the table web_page to page authors identified in the web_editor table as users with context our_context, where our_context equals id (Entity ID) and editor_code equals authors.

Access to the row is possible only when appropriate context is attached to the user session.

conn web1/webbie1@yourdatabase;
delete from web_schema.web_page;
0 rows deleted;
-- some web1 user who is neither author, nor editor
exec manager_schema.app_context('120');
delete from web_schema.web_page;
0 rows deleted;
-- some web1 user who is an author for page 2
exec manager_schema.app_context('102');
delete from web_schema.web_page;
1 rows deleted;

This sample is simplified to convey the concept. Each SQL statement (SELECT, INSERT, UPDATE, DELETE) applied to the table could have none, one, or more policies (in which case-generated predicates are ANDed).

So why assign only two accounts for all authenticated users to share instead of assigning a separate account for each? Because security for many applications needs to be managed not according to which account someone holds but according to the various attributes one can ascribe to them. The tools to achieve this goal will be explored in the sections below on VPD.

Basic Role Security. When a user accesses the Companion application by clicking on a link, she is, from a security standpoint, accessing a module. Each module has associated with it an Oracle security role that is assigned or denied to each user account. These security roles govern object-level security for tables, views, procedures, and functions. Basic Role Security simply assigns broad user rights to the object (select, insert, execute, and so on) and is the first step in establishing access. Specific user rights are handled by FGAC.

VPD Secure Application Context. The Companion uses Oracle's Secure Application Context to assign an Entity ID to a user at login. From two simple identifiers, the account (staff or grad) and the context (Entity ID), most access considerations are established and the rest are easily derived.

These identifiers for a user's login session are carried by an encrypted security token that is read every time a query (i.e., a click on a hyperlink to something in the database) is made. User login (name/password) along with Entity ID are stored at the Web server for the duration of the session in the encrypted token. While login is being used to establish connection to the database and get access to the data structures, Entity ID is used to refine data access to the row level through security policies.

VPD Fine-Grained Access Control. Using the Entity ID identifier provided by Secure Application Context, Oracle security policies are able to govern access at the row level. These FGAC security policies are—as with Basic Role Security—attached to objects. Unlike Basic Role Security, FGAC policies hone access to attributes associated with the Entity ID. The security policies attached to all nonpublic database objects help to seamlessly strip the restricted data. Therefore, even if the database login name and password are stolen, attempting to run a query (using a standard SQL tool) against any policy-protected table would result in an empty or limited record set. A policy evaluates the user's attributes and the row's attributes and decides which, if any, rows the user may see. For example, only entities who live in Vancouver and who are on the invitation list for the West Coast Executive dinner may view the list of registrants to the same event. Another example: an editor assigned to a Web site may not delete its pages, but an author of that site can.

Affinities. Our 160,000 entities are a varied and complex constituency. Each has a number of affinities, such as age, degree, financial grouping, region, and so forth. In order to make it easy for lay administrators to assign access rights, many such attributes have been summarized into a list of affinity groups. An affinity look-up table is updated daily and lists all entities and their corresponding affinities. FGAC uses this table to manage access to data rows that have been assigned affinities as well. Affinities are also used to strategically push personalized content to site visitors.

Figure 4 depicts the five steps of data request filtering that occur when a user calls a page.

figure 4

Figure 4: The five steps of data access filtering.

Synchronization

Keeping the data stored on the Companion and on the central database synchronized became a challenge because both are being constantly updated. We devised a system that is slower than real time but guarantees conformity to strict data standards.

For starters, a copy of "synchronized" data from the central database is refreshed nightly on the Companion (See Figure 5). The most important object from this dump is the Entity table, which forms the basis of our internal security.

This set of synchronized tables is actively used in Companion modules but is not updated by them. In fact, these updates are placed in holding tables and, if necessary, are carefully edited by records staff using a specially designed client-server interface. Updated data is then fed back to the central database once it has been edited to conform to standards. Because all rows in the central and Companion databases are date-time stamped, the programs that manage this flow carefully watch for updates on the same record that are potentially in conflict. Accurate, updated data is able to flow to the central database in greater quantity and with less manual effort than before.

Because our database is a Companion, it borrows fields and data from a few other big databases. The name of every field in the Companion is prefixed by a one-letter code representing its source database. By forcing recognition of data source into the name of every field in every table, opportunities for synchronization are not missed.

figure 5

Figure 5: Companion Database Synchronization

Lessons Learned and Advice to Developers

The key lessons Kardasz's team learned from this ongoing project are as follows:

* Coding standards: Having style guides and formats in place for how to structure code, name fields, and name tables is beneficial to programmers. Other rules and formats exist and are consistently followed for use and placement of stored procedures and projects.
See Style Manual

* Comments: Oracle allows architects to comment on every field and every table. TOAD (http://www.qsft.com/toad/) has functions for producing data dictionary reports that include these comments. This way, you comment as you go. We did not do this at first, and it was a pain to go back and do it later. You'd be surprised how much you forget in a short while.

* Threads: Oracle Multithreaded Session Handling performs better than concurrent sessions. We should have done this at the beginning.

* Row-level security: It is much easier and more intuitive to design and program a system of row-level security that is to run from tables and application code. For each row of data, there are corresponding security attributes stored in a table somewhere. Unfortunately, this kind of application-level security slows performance to a crawl and is vulnerable to oversights in application programming. For performance and scalability, we recommend investing in Oracle9i's Fine-Grain Access Control and Virtual Private Database technology, in spite of the challenging learning curve.

* Team size and extreme programming: We started out practicing a somewhat informal "extreme programming." Later we found that even in a tiny group of programmers (in this case, three), development proceeded more efficiently when responsibilities, functions, and expertise were more clearly defined. Support from our DBA guru, networking guru, and Oracle programmers who look after the central database has been invaluable.

* Quality assurance: Due to lack of resources, we did not have a quality assurance tester dedicated to the project. We relied instead on various technical and nontechnical staff to offer feedback. This informal "volunteer" testing staff did not work well. Many bugs and missing functions went undetected for a long time. As a result, debugging and modifications took much longer than expected.

* Copy server: We maintain not only a set of test servers (Web server plus Oracle9i server) for all development but also an Oracle9i copy server that is refreshed from production every 24 hours. The copy server is a fast and easy backup. Our experience with restoring files and data from the copy server has been exceedingly positive. The copy server is ready to replace the production DB server at a moment's notice. Often, modifications on the test database server is first ported to the copy server and tested prior to their final upload to the production servers. We are also looking to implement Oracle Standby database, thus ensuring zero data loss in the case of severe production server failure.

* Backups: We still maintain a traditional tape-backup system with off-site storage. Unfortunately, backup tapes are difficult to work with, prone to failure, and slow to retrieve. We like a dumping mechanism better. The Oracle9i database and Web server files are dumped to an available drive on another server. Dailies are kept for one week. Weeklies are kept for one month.

* Resource Allocation: We estimate that the total time spent on PHP and PL/SQL programming was only 22% of allocated time. This small figure further confirms to us the low-cost benefit of these tools. Figure 6 shows a complete breakout of time allocation.

figure 6

Figure 6: Project Time Allocation

We feel that the sky is the limit using the development model described in this article. The model, in sum, consists of the companion database holding a core of synchronized data, multi-tier application structure, semi-independent modular development, core security components that provide the framework for new modules, a full range of Oracle security features (especially FGAC), and the rapidity and reliability of PHP programming. The functionality, accuracy and relevance of our powerful central databases are augmented, while new functionality for our users is added with minimal logistic obstacles every time a new module is released. Overall, the most satisfying thing is how well it works for users and how manageable technical support is.

Kevin Kardasz (kevin.kardasz@mcgill.ca), McGill University's IS manager for Development, conceived of the project and acts as project manager. Vadim Kudryavtsev (vadim.kudryavtsev@mcgill.ca), senior programmer, is their database development and security expert (VPD, FGAC). Robert Mark (robert.mark@mcgill.ca), Web programmer, is their top PHP expert. Mikhail Seliverstov (mikhail.seliverstov@mcgill.ca), Web programmer, is an expert business analyst and system architect.

By Kevin Kardasz, Vadim Kudryavtsev, Robert Mark, and Mikhail Seliverstov
courtsy: oracle.com

No comments:

Financial Engineering