Wednesday

Building Semantic Web CRUD operations using PHP

When developing a Web application, it's standard practice to create a database structure on which server-side code is placed for the logic and UI layers. To connect to the database, the server-side code needs to do some basic creating, updating, deleting, and — most importantly — reading of records. As databases behind Web applications are typically relational databases, these CRUD operations are done using the well-known language, SQL. However, as Web development is increasingly occurring through object-oriented programming (OOP), the model is changing.

The Resource Description Framework (RDF) is a perfect way to describe objects while maintaining the meaning of that data. Simple Protocol and RDF Query Language (SPARQL — pronounced "sparkle") is the language typically used to query against that data, as it syntactically matches the structure of RDF itself. Both RDF and SPARQL are technologies within what has been dubbed the Semantic Web stack.

To fully embrace the Semantic Web idea, you can apply traditional Web-development techniques to RDF data using SPARQL. This article shows how to use a simplified Model-View-Controller (MVC) design pattern, the PHP server-side scripting language, and SPARQL for connecting to RDF — as opposed to using SQL on a relational database system.

SQL and SPARQL CRUD operations

Prerequisites

This article assumes a basic understanding of SQL, PHP, and Web application development. An understanding of Semantic Web is also beneficial. To run the create, update, and delete commands on Semantic Web-based data, you need a Semantic Web data store that supports the SPARQL/Update specification.

It's worth taking a look at the similarities and differences between CRUD operations when developed in SQL and SPARQL. Listing 1 shows the SQL code for a read operation.


Listing 1. SQL for the read operation
SELECT realname, dob, location FROM UserTable  WHERE realname = "John Smith"; 

Compare that SQL-based code with the SPARQL-based code shown in Listing 2. These are two read operations because they're the easiest to understand, implement, and explain. This is true for both SQL and SPARQL.


Listing 2. SPARQL for the read operation
     PREFIX foaf:   PREFIX rdf:  SELECT ?uri ?name ?dob ?location FROM  WHERE { ?uri rdf:type foaf:Person ; foaf:name "John Smith" ; foaf:birthday ?dob ; foaf:location ?location . } ; 

Your first thought when comparing the two listings is likely to be that the SPARQL version clearly has many more lines than the SQL version. That is true, but don't be tricked into thinking that the SQL is necessarily simpler and cleaner. SPARQL, depending on the engine that you run it against, can be completely distributed through something known as the linked data effect. In addition, it allows you to have dynamic schemas because of its interlinked object-oriented perspective, in contrast to SQL's strictly relational perspective. If you were to split relational database tables into as many islands of data, you would actually have many more lines of SQL in comparison to SPARQL — not to mention that the SQL would be full of those nasty JOIN descriptors.

The first two lines of the SPARQL are the PREFIX declarations. According to Semantic Web theory, everything — whether an object or a data graph source (also an object) — has a Uniform Resource Identifier (URI). The PREFIX lines are simply applying a temporary label to some URIs — in this case, the Friend of a Friend and RDF schemas. The benefit here is that you can use the PREFIX declarations later in the query instead of having to use the full URIs.

The next line of the SPARQL code describes the query request. It's essentially the same as the SQL statement, except for the additional request for the URI. Take note of the use of question marks (?) to indicate that the term is a variable.

The FROM statement describes where to grab data. It's the same in SQL and SPARQL, except that in SPARQL, the data source name is a URI, rather than a string denoting a physical location on your computer or network.

The WHERE statements are quite different from each other because with SPARQL, you must specify which schemas to use to fetch data. Once again, if you tried to do this using traditional methods, you would need a lot more than plain SQL: You'd need to use the PHP, the Java™ programming language, or some other server-side language to do checking between data sources. It is reasonably clear what the lines of SPARQL do, which includes ensuring that the data being retrieved is only of the type Person. SPARQL fetches a name and a location while doing some pattern matching to find the right John Smith.

Create

CRUD operations in SPARQL are typically a bit more shrouded in mystery than the read operation. However, they can be done. To begin, the create operation inserts a new record or object into the table or graph.


Listing 3. SQL for the create operation
     INSERT  INTO UserTable (realname, dob, location)  VALUES ("John Smith", "1985-01-01", "Bristol, UK");  

Now, compare the SQL-based code in Listing 3 with the SPARQL-based code in Listing 4 for the same create operation.


Listing 4. SPARQL for the create operation
     PREFIX foaf:   PREFIX rdf:  INSERT  INTO GRAPH   (?realname, ?dob, ?location)  {  rdf:Type  foaf:Person ;   foaf:name "John Smith" ;   foaf:birthday  <1985-01-01T00:00:00> ;  foaf:location "Bristol, UK"  } 

Once again, notice that the PREFIX lines work exactly as they do in the read operation with SPARQL. The INSERT INTO works similarly to SQL, but again, this is URI-based rather than string-, table-, and name-based, which allows the operation to be done across HTTP. You must also specify the schema again. Here, it's slightly easier to understand than in the read operation, as you can have practically any kind of property so long as it is compatible with the schema. This is a benefit and a beauty of the distributed dynamically extensible objects formalism RDF provides.

Delete

If you create, at some point, you are probably going to want to delete. For instance, users may want to delete their account on your site (obviously unfortunate that they want to leave, but they may have valid reasons). Listing 5 provides the SQL code for a typical delete operation.


Listing 5. SQL for the delete operation
     DELETE FROM UserTable  WHERE realname = "John Smith" 

Now, compare the SQL-based code in Listing 5 with the SPARQL-based code in Listing 6.


Listing 6. SPARQL for the delete operation
     DELETE  FROM GRAPH  { ?predicate ?object } 

The fundamental difference between the SQL and the SPARQL code is that the SQL deletes a row in a table, whereas the SPARQL deletes "all triples" relating to the "John Smith" object denoted by http://www.example.org/graph/johnsmith#me. This difference is the result the graph-based nature of the RDF model.

Update

Many Web applications allow users to update their information. The UPDATE operation is what makes that possible. Listings 7 and 8 demonstrate how to code this in SQL and SPARQL.


Listing 7. SQL for the update operation
      UPDATE UserTable SET location = "Weston-super-Mare, UK" WHERE realname = "Joanne Smith" 

Now compare the SQL-based code in Listing 7 with the SPARQL-based code in Listing 8 for the update operation.


Listing 8. SPARQL for the update operation
     PREFIX foaf:  MODIFY   DELETE {?uri foaf:location ?location}  INSERT {?uri foaf:location "Weston-super-Mare, UK"} WHERE { ?uri foaf:name "Joanne Smith" } 

UPDATE with SPARQL may seem incredibly silly, but it's completely valid when you understand that you aren't updating a relational table row — you're updating one very specific relationship within a graph. The easiest way to do this without attaining multiple locations is to delete and insert. The MODIFY keyword is used to establish the connection to the right graph.

Connecting to SQL and SPARQL database systems

To execute the above SQL and SPARQL statements, you must connect to the system somehow. Different systems, obviously, have different connection methods. One common method is to connect to a generic database using Open Database Connectivity (ODBC) drivers, which are often included in current versions of Mac OS X and Linux® systems, and are installable on other operating systems such as Windows®. ODBC essentially provides a simple generic API to connect to an SQL database of your choice. Interestingly, ODBC also works with some Semantic Web data stores such as OpenLink Virtuoso. However, most other Semantic Web data stores require some custom connection procedure or a custom-made generic connection system such as RDF2Go (at the time of writing, RDF2Go is only for Java technology) that work with a variety of systems such as Seseme and Jena. An alternative to consider if your data is going to be exposed over HTTP is a SPARQL connection method over HTTP, which makes your data "Linked Data"-ready and can be completely distributed. Because the range of connection methods for SQL and SPARQL vary, it isn't feasible to cover these in any detail here.

SQL and SPARQL through PHP

After selecting a connection method, the traditional next step is to establish common operations in PHP. For a customized system, this is usually done using SQL strings with PHP variables injected and passed to it through function parameters. The function will then connect to the database and execute this transaction. The proposal here is to do exactly the same for a SPARQL-RDF connection as would be done for an SQL-RDBMS connection.

So, take a gander at another code comparison — this time using the PHP language — starting with read, then going on to create, delete, and update, as with the code comparisons above. Use a hypothetical query execution function called query_execute, which takes in a string representation of the SQL/SPARQL statement.

Read

First up is the simple read operation. Variables can be injected into the query using string concatenation in PHP.


Listing 9. SQL for the read operation
     function readUserInfo($realname) {     $sqlstatement = "SELECT realname, dob, location FROM UserTable WHERE     realname = \"" + realname + "\";";     return query_execute($sqlstatement);     } 

Now, compare the SQL-based PHP code in Listing 9 with the SPARQL-based PHP code in Listing 10 for the humble read operation.


Listing 10. SPARQL for the read operation
      function readUserInfo($realname) {     $sqlstatement = "PREFIX foaf:      PREFIX rdf:  SELECT ?uri ?name ?dob ?location  FROM  WHERE  { ?uri rdf:type foaf:Person ;  foaf:name \"" + $realname + "\" ;  foaf:birthday ?dob ; foaf:location ?location . } ;";     return query_execute($sparqlstatement); } 

As you can see, the above function has been developed in such a way that the function name and parameters are identical, which means that you could start using the SPARQL version straightaway. Plus, if you're using an ODBC system, you won't need to worry about return types changing.

Create

As above, so below. The following PHP methods inject variables into a string to run a dynamic query. Listing 11 shows how to do that with a create operation.


Listing 11. SQL for the create operation
      function createUserInfo($realname, $dob,     $location) {     $sqlstatement = "INSERT INTO UserTable (realname, dob,     location) VALUES (\"" + $realname + "\", \"" + $dob + "\", \"" +     $location + "\");"; return query_execute($sqlstatement); } 

Now compare the SQL-based PHP code in Listing 11 with the SPARQL-based PHP code in Listing 12 for the create operation


Listing 12. SPARQL for the create operation
      function createUserInfo($uri, $realname,     $dob, $location) {     $sparqlstatement = "PREFIX foaf:      PREFIX rdf:  INSERT INTO GRAPH      (?realname, ?dob, ?location)      {      " + $uri + " rdf:Type     foaf:Person ;     foaf:name \"" + $realname + "\" ;     foaf:birthday         <" + $dob + ">> ;         foaf:location \"" + $location + "\}" return query_execute($sparqlstatement);     } 

For simplicity, the SPARQL version of this function includes a URI parameter, although it is simple enough to do some additional string concatenation to make the function declaration identical to the SQL.

Delete

Use injection through concatenation in PHP to perform the delete operation. Please note, as mentioned in the previous section, that different variables pass into the SQL and the SPARQL versions. Listing 13 shows the SQL code for the delete operation.


Listing 13. SQL for the delete operation
      function deleteUserInfo($realname) {     $sqlstatement = "DELETE FROM UserTable WHERE realname = \"" + John Smith + "\"";         return query_execute($sqlstatement);  } 

Now, compare the SQL-based PHP code in Listing 13 with the SPARQL-based PHP code in Listing 14 for the delete operation.


Listing 14. SPARQL for the delete operation
      function deleteUserInfo($uri) {     $sparqlstatement = "DELETE FROM GRAPH      {<" + $uri +     "> ?predicate ?object }";      return query_execute($sparqlstatement);  } 

Once again, for simplicity, the SPARQL version of the function takes in a URI, rather than the real name. You could use the DELETE SPARQL statement with a WHERE clause to find the data by name rather than URI, which would enable you to have an identical function header to the SQL version.

Update

Updating a record is fairly simple in both SQL and SPARQL through PHP. Just be aware of the different structures that relational databases and RDF provide. Listing 15 shows the SQL code for the update operation.


Listing 15. SQL for the update operation
      function updateUserInfo($realname, $location)     {         $sqlstatement = "UPDATE UserTable SET location = \"" + $location + "\" WHERE realname = \"" + $realname+ "\";";             return query_execute($sqlstatement);      } 

Now compare the SQL-based PHP code in Listing 15 with the SPARQL-based PHP code in Listing 16 for the update operation.


Listing 16. SPARQL for the update operation
      function updateUserInfo($realname,     $location) {      $sparqlstatement = " PREFIX foaf:  MODIFY  DELETE {?uri foaf:location ?location}  INSERT {?uri foaf:location \"" + $location + "\"} WHERE {      ?uri foaf:name \"" + $realname + "\"     }         ";         return query_execute($sparqlstatement);     } 

The function declarations in the SQL and SPARQL versions are identical. As a result, swapping from SQL to SPARQL is simple.

Although it's quite easy to move from a data source with an SQL endpoint to a data source with a SPARQL endpoint, there are two important areas you need to be aware of: levels of abstraction and the similarities and differences between SQL and SPARQL. You want to avoid that trapped feeling that comes from thinking that one language is practically identical to another. So it's best to understand a language's limitations while at the same time exploiting the language's features — in particular, the various syntax sugar and graphical representations provided by different languages.

Levels of abstraction

The example code in this article is strongly coupled to the database structure. In theory, there is only a certain level of loose coupling that SQL and RDBMS can cope with before table structure reorganization must occur. However, this is not the case with SPARQL and RDF. With SPARQL, your coupling with data can be very abstract because of the distributed but interlinked nature of the RDF. The reusability of the examples in this article could potentially be improved by increasing their abstractness. However, for our purposes, simple closely coupled functions are used to exemplify the similarities and differences between SPARQL and SQL.

As you can see from the example code, there are many similarities between SQL and SPARQL. The differences come in when understanding the Web-based, graphical, and object-oriented nature of RDF and how that filters into the SPARQL language. As a simplified rule of thumb, you can imagine the triple structure in RDF and SPARQL as basically representing, in order, the unique primary key of a row (the subject), the attribute/column name (the predicate or relationship) and the cell data that is based on the row and column (the object). In addition, SPARQL can take full advantage of HTTP communication, and, therefore, data can be (but doesn't have to be) distributed over intranets, extranets, and the wider Internet.

Why move from SQL to SPARQL?

There are many reasons why you would want to move from SQL to SPARQL. The details extend beyond the scope of this article, but you could be motivated by the following points:

  • You want a more distributed data solution.
  • You want to expose your data on the Web for people to use and link to.
  • You may find Node-Arc-Node relationships (triple) easier to understand than relational database models.
  • You may want to understand your data in a pure object-oriented fashion to work with an OOP paradigm (PHP V5 and later supports OOP).
  • You want to build generic agents that can connect to data sources on the Web.

Of course, there are also reasons why you may not want to move away from SQL, and they are probably perfectly valid reasons. SPARQL is an additional method of querying, not necessarily an immediate replacement of SQL. The same goes for relational data and Semantic Web-based data. These are not replacements. Instead, it's best to think in terms of merging newer and older techniques to produce a hybrid system that can handle and be handled by older legacy systems and the current and future systems.

curtsy: Daniel Lewis

No comments: