Darren's Developer Diary: DB

Showing posts with label DB. Show all posts

Thursday, June 16, 2011

PHP PDO: so hard to debug

I wrote a simple PDO helper function to update fields in a certain database table. The fields are given as the key/value pairs in $d, and my function looked like this:

$q='UPDATE MyTable SET lastupdate=:lastupdate';
foreach($d as $key=>$value)$q.=', '.$key.'=:'.$key;
$q.=' WHERE username=:username';

$statement=$dbh->prepare($q);
foreach($d as $key=>$value)$statement->bindParam(':'.$key,$value);
$statement->bindParam(':lastupdate',date('Y-m-d H:i:s'));
$statement->bindParam(':username',$username);

It all looks reasonable doesn't it? Create the SQL, then assign the values to. But it didn't work. My $d array looked like:

array( 'status'=>'expired', 'mode'=>'' )

Instead of getting set to expired, the status field ended up blanked out. Yet lastupdate and username got set. This had me scratching my head for ages.
PDO has a debug function that is next to useless: it tells you the parameters, but not the values you've assigned to them. Incredibly annoying.

Have you spotted my bug yet?

Here's the answer. Though all the examples in the PHP documentation use bindParam(), the function to assign a value is bindValue(). You should always use bindValue(), unless you actually need the advanced functionality that bindParam() gives you. What advanced functionality you wonder? Instead of assigning the value immediately, it attaches a reference, and uses the final value of that reference variable. You're ahead of me: in my foreach loop the $value variable changes on each iteration.

If PDO had a decent debug function I'd have discovered that in half the time. Oh well, now I know!

Tuesday, May 24, 2011

PHP, PDO, SQLite, mysterious lock problem

Let's start with the conclusion:
If doing a prepare() or query() with PDO and sqlite, and then you want to do something else with sqlite in the same PHP function then unset the first PDOStatement, before trying to do that something else.

As an example here is my code, to get a unique ID ($dbh is a PDO connection to an sqlite):

function get_next_id($dbh){
$q='SELECT next FROM MyNextId';
$obj=$dbh->query($q); //Throws on error
$d=$obj->fetch(PDO::FETCH_NUM);
$next_id=$d[0];

$q='UPDATE MyNextId SET next=next+1';
$row_count=$dbh->exec($q); //Throws on error
if($row_count==0)throw new Exception("Failed to execute ($q)\n");

return $next_id;
}

It works on Ubuntu 10.04, with sqlite 3.6.22, but fails on Centos 5.6, with sqlite 3.3.6, with this message:
exception 'PDOException' with message 'SQLSTATE[HY000]: General error: 6 database table is locked'

I went through the whole changelog from 3.3.6 to 3.6.22, but got no clues (though I am now impressed with how active and organized the sqlite development is). But finally I tracked down this article on someone getting similar errors.

And that was it. I could have used $obj->closeCursor(), but deleting $obj is just as good:

function get_next_id($dbh){
$q='SELECT next FROM MyNextId';
$obj=$dbh->query($q); //Throws on error
$d=$obj->fetch(PDO::FETCH_NUM);
$next_id=$d[0];

unset($obj);

$q='UPDATE MyNextId SET next=next+1';
$row_count=$dbh->exec($q); //Throws on error
if($row_count==0)throw new Exception("Failed to execute ($q)\n");

return $next_id;
}

If you are doing just one PDO action per function then there is no need, because exiting the function will automatically do the unset.

(I don't know why this is a problem on sqlite 3.3.6 but not sqlite 3.6.22... in fact, I suspect it may be due a difference in the PDO or PHP version or configuration instead. Apologies for the loose end!)

Tuesday, June 15, 2010

doctrine: delete object but leave it in database

I've a website where I make a page from a doctrine object which is taken from a database. Nothing special there. For adding the data to the database I have a form that goes to a preview page then a confirm button that actually writes it. For the preview page I share the same display logic, so I construct a doctrine object and simply don't save it. For editing, I do the same: I load the doctrine object from the database, make the changes specified on the form, and then either show the preview page or save to the database (depending on if preview or confirm was clicked).

The problem came with editing a one-to-many relation. As a concrete example let's say the user is registering multiple email addresses. I was doing this:
   $User->Emails->delete();

Then doing this for each address the user gave:
   $User->Emails[]=new Email(...);

It nicely handles when the user has removed an email address, changed one, or added an extra one. But when I realized the flaw I slapped my forehead so hard I gave myself whiplash. Do you see it? Have a moment...

Yep, if they click cancel (or leave before clicking confirm) they're expecting nothing has changed, whereas in fact all the email addresses they'd previously input have vanished.

The problem is that delete() happens immediately, rather than waiting for me to call save(). I hunted high and low for a way to stop that. My final solution was:
   if(confirmed)$User->Emails->delete();
   else $User->unlink('Emails');

I.e. unlink() appears to be the delete-that-does-not-touch-database I was searching for. (Of course, don't do something silly like go and call save() now, or the user will lose their email addresses and you'll have some orphaned records in your Emails table.)

Incidentally this did not work:
$user->clearRelated('Email');

I'd hoped it would, after reading the description in the manual. But in fact it did nothing at all.

Tuesday, March 23, 2010

The Dangers Of GroupBy

I had a complex join (three tables) with a group by and a where cause with three conditions. It seemed to be working, until I added more data; I eventually narrowed problem down to just a single table. Consider this query:


SELECT w.user_id, w.some_date, w.id
FROM mytable w
GROUP BY w.user_id

id is the primary key. That query returns all the data:


user_id   some_date   id
   1      2010-02-01   1
   5      2010-02-02   2
   1      NULL         3
   1      2010-03-22   4
   1      2010-03-24   5

I'm only interested in the record with the latest date for each user, so I add a MAX on some_date and GROUP BY user_id:


SELECT w.user_id, MAX(w.some_date), w.id
FROM mytable w
GROUP BY w.user_id

That gives:


user_id   MAX( w.some_date )   id
   1       2010-03-24           1
   5       2010-02-02           2

Hang on! The dates are right, but 2010-03-24 is from id=5, not id=1. The some_date column has been max-ed in isolation.

This issue is explained in an excellent two-part article: part1 part2

In fact it is so excellent I didn't fully understand it, and am going to re-read it three times a day until I do. I really need to understand it because, as the author shows, it is very easy to get correct answers from a query on your test data, pass all your unit tests, and deliver something that goes wrong when one more record is added.

My query is simpler than the one in his example, but after many tries I cannot find a solution. The "join(...)as d" syntax does not seem to work in MySQL or I am not using it correctly. It seems what I'm trying to do is very basic, so I do not understand why I cannot find more advice on the subject. I'm open to suggestions!

(For the moment I'm going to give up, do a simpler query and have PHP process the results to get the data I actually want.)

Tuesday, March 2, 2010

Creating A Doctrine Custom Behaviour, Part 2

In part 1 we made a simple custom behaviour and then gave it some options we could customize in our schema. The second half of the power of Doctrine's behaviours though is being able to set values. First, uncomment this line in the last example in part 1:


    $this->addListener(new DarrenTestableListener($this->_options));

You should get told class DarrenTestableListener does not exist, so create DarrenTestableListener.php in the same directory as DarrenTestable.php (though, as mentioned before, it can go anywhere in your models directory tree), and fill it with this code:


class DarrenTestableListener extends Doctrine_Record_Listener
{
    protected $_options = array();

    public function __construct(array $options)
    {
        $this->_options = $options;
    }

    /** Called when a new record is created */
    public function preInsert(Doctrine_Event $event)
    {
        $name = $event->getInvoker()->getTable()->getFieldName($this->_options['name']);
        $modified = $event->getInvoker()->getModified();
        if ( ! isset($modified[$name])) {
            $event->getInvoker()->$name = "C";
        }
    }

    /** Called when an existing record is updated  */
    public function preUpdate(Doctrine_Event $event)
    {
        $name = $event->getInvoker()->getTable()->getFieldName($this->_options['name']);
        $modified = $event->getInvoker()->getModified();
        if ( ! isset($modified[$name])) {
            $event->getInvoker()->$name .= 'U';
        }
    }

    /** Handle dql update queries */
    public function preDqlUpdate(Doctrine_Event $event)
    {
        $params = $event->getParams();
        $name = $event->getInvoker()->getTable()->getFieldName($this->_options['name']);
        $field = $params['alias'] . '.' . $name;
        $query = $event->getQuery();
        if ( ! $query->contains($field)) {
            $query->set($field, '?', 'U');
        }
    }

}

It has default options (and a constructor to set the options), then one function to handle INSERTs and two functions to handle UPDATEs. (There are also hooks for DELETE, save and validation.) All three functions follow a similar pattern:

Get the real fieldname
See if we need to do anything
If so, set the field

What we do is set the field to "C" when it is created. Then each time it is updated we append a "U". E.g. after three updates it will look like "CUUU". (TODO: I'm not so familiar with DQL, and I am not using it, so the DQL version replaces the existing data instead of appending. If you can supply the proper code for it let me know!)

Incidentally if you set a default in the options that will get precedence over your code (that is what the getModified() call is for, I assume). E.g. if I change my YAML schema to:


ActsAs:
    DarrenTestable:
        name: being_silly
        options:
            default: Hello!

Then after a couple of updates being_silly will contains "Hello!UU".

Creating A Doctrine Custom Behaviour, Part 1

Doctrine is a wonderful ORM library, but the manual is..., well, let's just say I think it was written by the developers, and out of obligation :-)

Behaviours are a motivating feature for using Doctrine, so here is a two-part tutorial on how to add your own minimal behaviour. The first part will show how to add a field to a table. The second part will show how to have its contents set automatically. For a more in-depth example see here or study the source code of the core behaviours.

The first thing you need to know is to put your behaviour classes in your models directory, or a sub-directory of your choice. You don't need a directory structure that matches the "XXX_YYY_" prefixes of the class name. This is assuming you are using Doctrine's auto-loading of table classes; if not you can put them anywhere you like of course. I'm putting them in models/behaviors/ (yes, using American spelling!)

Make a file called "DarrenTestable.php" with these contents:


class DarrenTestable extends Doctrine_Template{
    public function setTableDefinition(){
        $this->hasColumn('darren_testing','string');
    }
}

See, I told you it was minimal! To add it to an existing class, add this to your YAML schema:


ActsAs:
    DarrenTestable

Recreate the class from the schema and you should see a field called "darren_testing". On MySQL the type is "text" (i.e. this is what "string" maps to if you are using MySQL).

Even that very small example is doing something useful: if this behaviour is used in 10 classes and we need to change the name or data type (or set a default, add an index, etc.) we only need to edit it one place.

While still staying within the scope of a minimal example, we can expand it in two directions. We can have our code automatically set the field value (see part 2 of this article), and we can allow users to customize it. Here is how we add some options, with defaults:


class DarrenTestable extends Doctrine_Template
{
    protected $_options = array(
        'name'=>'darren_testing',
        'type'=>'string',
        'options'=>array(),
        );

    public function setTableDefinition()
    {
        $this->hasColumn($this->_options['name'],
            $this->_options['type'], null, $this->_options['options']);
        //$this->addListener(new DarrenTestableListener($this->_options));
    }
}

I have added the call to the listener we'll use in part 2, but commented it out for the moment. The options contains a sub-array called options which is passed on to Doctrine_Table::setColumn() and can be used to set defaults and so on.

Usage is exactly like above if we want the defaults. If we want to change the name, and require it to be set (i.e. NOT NULL) you would do:


ActsAs:
    DarrenTestable:
        name: being_silly
        options:
            notnull: true

Saturday, January 16, 2010

Zend Framework

I've been hearing a lot about Zend Framework over the past year or two, it seemed to have become the de facto standard framework (something else I've been hearing a lot about) and I felt a need to get up to speed on it.

I put up my review of my review of Easy PHP Websites with the Zend Framework. I bought the book a few months back, as it seemed the best of the choices at the time. Quick summary of the review: Okay book, not impressed by Zend Framework, last third of the book is useful and interesting.

I'll talk about not being impressed by the DB part of Zend below, but I feel the main other aspect that bothered me was the way data is given to and then used in views. In the controller I have to write:
$this->view->something=$mymodel->get_something();

Then in the view it is used as:
Our something is <?php echo $this->something;?> today!

All that $this-ing and view-ing makes the code verbose, hiding the real thing I'm trying to say with my code. In particular the view code is ugly - ZF has not helped at all compared to writing plain PHP.

Why do I care? Try getting a designer to edit a view that looks like that. They may be comfortable with HTML but will go pale and faint at the sight of a dollar sign let alone all the surrounding junk. I can tell you from experience they will refuse to even edit a file that contains text that looks like that, let alone type it themselves.

I was also not impressed with how Zend Form does input validation. It seems to be just as much work as doing it in plain PHP. If I add a dependency I want value for money!

Back to databases. I recently wrote about being impressed with Doctrine ORM. Zend DB didn't impress me at all; I already use Pear::DB to abstract away the database, and all Zend adds is having to use PHP instead of SQL to write common queries, while still having to use SQL to write the less common ones.

A web search found I wasn't alone in this opinion, but what was interesting was I found people using Doctrine within Zend Framework, as a replacement for Zend DB.

To quote an article by Ruben Vermeersch: "While Zend_Db isn't a bad technology, it is still quite low-level and close to the underlying database. Using Doctrine, you can manipulate your data like objects, without worrying about the database too much." The article is pro-Zend and pro-Doctrine, and shows how to use the two together. Good clear article.

Here is another example of using Zend and Doctrine together. It is just example code, not a teaching article, but interesting as it also uses ajax (using jquery).

Here is an article comparing Symfony (using Doctrine) with Zend, with a pro-Symfony slant. Also check out the comments, especially the reply by Matthew Weier O'Phinney describing what is coming up in Zend (including how it may have more formal integration with Doctrine at some point, and hopefully auto-admin pages and auto-forms-from-schema).

By the way, the afore-mentioned Ruben Vermeersch article suggests these articles to get up to speed on Zend:
http://framework.zend.com/docs/quickstart
http://akrabat.com/zend-framework-tutorial/

I'd like to stress again that my views of Zend Framework are currently only based on studying it; I'll get back to you once I've tried it out on a full-scale real-world project.

Monday, January 4, 2010

Doctrine ORM

I mentioned before, when talking about my Step-By-Step Regex article in php|a magazine, that the same issue (August 2009) had some other interesting articles.

One is about Adminer, an alternative to phpMyAdmin which is designed to be very compact and exists in just one file (so it is easy to upload temporarily to a production server for instance); the article is mainly about how Adminer was written, which was interesting.

But the article - and library - I most want to talk about is Doctrine ORM. The main problem this library is solving is this: when you add a field to your database you have to make the change in at least two places, your PHP script and your database. That sucks. The article describes itself as a tour around Doctrine's more advanced features, but I found I didn't need any other introduction, and it sold the features in a very clear and logical way.

The first thing I liked was you can describe your database schema in any of three ways: SQL, PHP, or the compact YAML files, and utilities are provided to convert between all three. By SQL I mean an existing database, that you could have been creating interactively from phpMyAdmin, or it could be a legacy database that has been around for years and now needs to be connected to a php script.

What only the Yaml and PHP schema formats can specify are what Doctrine calls behaviours. By adding the Timestampable behaviour it will add created_at and updated_at fields to the database and keep them up to date for you. The Sluggable behaviour is for making URLs and is demonstrated in the php|a article, but others that caught my eye are SoftDelete (records just get the deleted_at field set when you delete, rather than physically being removed), and Versionable (each time a record is changed the previous record is saved, and can be reverted to). NestedSet helps build trees, I18n is for holding translations (see below), Searchable does indexing, Geographical can find nearby places.

Incidentally I covered I18n in a previous article and Doctrine is using type 1, but without the language table to store collation. There is no consideration of different collations, and no concept of default language. It is sufficient for most purposes though.

Looking at some behaviours not in the Doctrine core distribution, Taggable looks interesting, allowing you to add blog-style tags. Locatable apparently ties in with Google Maps to get latitude and longitude automatically given an address (it could then be used, for instance, with the Geographical behaviour to automatically find all people within 10km of you based on just their address). EventLoggable logs all actions (select, update, delete, etc.) to a disk file. Blameable records (in the table) who last made any change to a record.

Doctrine's behaviours feature is powerful, portable, the existing ones are customizable (all the above seem to have sensible defaults but also can have their behaviour fine-tuned) and new ones look easy to write.

The third big feature of Doctrine that I like is the way data is retrieved on demand, and how that ties in with table relations (see listing 6 in the php|a article). And if this becomes inefficient you can optimize without having to change much code (listing 7). I'm not a big fan of using PHP to write SQL queries. Generally to do anything interesting you end up dropping down to writing SQL anyway. But I was impressed with the example in the article, and you only need to start explicitly describing joins if you have profiled and found you need to optimize.

(By the way, it goes without saying that Doctrine ORM is database-neutral, not tying you into MySQL or any other database.)

The final cool Doctrine feature is migrations. When you make a change to the schema in either your YAML file or your PHP file you run a script and it will make a migration file. This can then be run to move a database between versions (backwards or forwards). I admit I currently do this with some hand-crafted SQL scripts, which are one way only, and painful. And sometimes I'm not even that organized.

I've not used Doctrine ORM for a real project, but I expect I will try it out very soon.

Monday, November 23, 2009

Escaping CSV in C++

There are two escaping rules for each field in a comma-separated value row:
1. Change each double quote to two double quotes.
2. Surround with double quotes if the field contains a comma or double quote.

These are the rules used by Excel and all other software that deals with CSV data.

As an example, if my fields are:
hello world
a,b,c
"CSV" is popular
""

Then it becomes:
hello world,"a,b,c,","""CSV"" is popular",""""""

C++ has a justified reputation as a hard language for text manipulation. Boost has libraries to make it a little easier, but I didn't want to add Boost as a dependency for a project I was working on. Fortunately std::string's replace() function turned out to be more powerful than I had realized:


void output_csv(std::ostream &out,std::string s){
if(s.find('"')!=std::string::npos){  //Escape double-quotes
    std::string::size_type pos=0;
    while(1){
        pos=s.find('"',pos);
        if(pos==std::string::npos)break;
        s.replace(pos,1,"\"\"");
        pos+=2; //Need to skip over those two quotes, to avoid an infinite loop!
        }
    out<<'"'<<s<<'"';
    }
else if(s.find(',')!=std::string::npos){ //Need to surround with "..."
    out<<'"'<<s<<'"';
    }
else out<<s;   //No escaping needed

If you like compact code then the while loop can be rewritten:


void output_csv(std::ostream &out,std::string s){
if(s.find('"')!=std::string::npos){  //Escape double-quotes
    for(std::string::size_type n=0;(n=s.find('"',n))!=std::string::npos;n+=2)s.replace(n,1,"\"\"");
    out<<'"'<<s<<'"';
    }
else if(s.find(',')!=std::string::npos)out<<'"'<<s<<'"';
else out<<s;
}

P.S. If you need to do the same in PHP, PHP 5.1 finally introduced fputcsv for it. The comments on that page show how to do it in older versions of PHP; my fclib library also contains functions for it.

Sunday, October 25, 2009

Regex Article in php|a

The August edition of PHP Architect magazine contained my introductory article on regexes. It has a PHP slant but mainly it is about regexes. Well, most of it is introductory but it contains a bonus section at the end using advanced regexes in SQL to repair a database in situ.

The print magazine arrived by air mail a couple of weeks ago and I finally got to read it. I was pleased (and relieved) that it read well. If you have read it I'd love to get some constructive criticism - especially if you are uncomfortable with regexes.

It was also my first print article to use colour, and I think that worked well too.

One correction in the complicated "Repairing With Regexes" section at the end of the article. The text says "So \1 has to be written \\\\1 (that is four backslashes)." That seemed a strange thing to write, so I took a look at the unit test source code (included with the PDF download version of php|a magazine) I had written. Four backslashes is correct when in PHP, but when in SQL it should be "\\1" not "\1".
(I just checked my emails from when we were proof-reading, and one of the edits had removed all those backslashes; I caught that at the time but ironically I got it wrong when putting them back in.)

The other minor correction I only realized after Arne Blanket's column in the same magazine issue! I had written: "...an IP address after that at sign, such as darren@10.0.0.1, which, while unusual, is technically valid". In fact "darren@[10.0.0.1]" is the technically valid form. Which is doubly annoying because my article's regex would reject those square brackets and I hadn't explicitly pointed that out. (No harm done, though, as this email form is highly discouraged.)

Arne's column also points out that top-level domains are no longer just two or three characters. So my '\.[a-zA-Z]{2,3}$' suggestion should really have been '\.[a-zA-Z]{2,}$'. Luckily, that suggestion was just in a list of other ideas, not part of the article's main regex.

(By the way I enjoyed, and will blog about real soon, some other articles in this particular issue; if you are not a subscriber, and your work involves data and PHP, then this is the back issue you should get!)

Thursday, August 20, 2009

Storing website translations in SQL

In php|a magazine, April 2009, there was an article called "Storing Multilingual Records in the MySQL Database". As the title says, it has some mysql-specific elements, but the concepts are quite general. The author introduced and compared three alternatives; as I have used a fourth way I thought I'd write about it here.

The situation was a product database, where the name, url and description have to be localized. But some entries will be untranslated and should fall back to using the default language entry.

1. Translations in a separate table

Here is the database schema:

product: product_id, group_id, price.
product_translation: product_id (links to product), language_id (links to language), name, url, description.
language: language_id, collation.

This requires an SQL join which is hard for mysql to execute efficiently. Fulltext indexes are also a problem, as the name and description fields contain text from different languages. The only good point of this approach is that it is easy to add a new language.

2. Data Copy

Here the group_id and price fields have been moved into the product_translation table. That makes the SQL queries a bit cleaner, as it saves one join, but doesn't really solve the other problems. And the data redundancy is an accident waiting to happen (I'm having a physical reaction just thinking about it).

3. Translation Directly In The Database

This is like 2 above, but each language gets its own field. For instance if we have three languages, English (en), German (de) and Japanese (ja) then it looks like:

product: product_id, language_id (links to language), name_en, url_en, description_en, name_de, url_de, description_de, name_ja, url_ja, description_ja, price, group_id.

The advantage this brings is using the default language is easy; you can either just always select the default language field, or use mysql's IFNULL function. E.g. either:
SELECT name_de,name_en WHERE product_id=123;
(and then check in your PHP script to see if name_de is blank, and if so use name_en instead)
Or:
SELECT IFNULL(name_de,name_en) as name WHERE product_id=123;

The other advantage is FULLTEXT indices work well now, as only one language is kept in each text column. Disadvantages are the space-wasting if actual translations are sparse, the work required to add a new language, and possibly hitting the mysql maximum row size (64K).

4. One table per language

This is the approach I've used in MLSN, among others:

product: product_id, group_id, price.
product_translation_en: product_id, name, url, description.
product_translation_de: product_id, name, url, description.
product_translation_ja: product_id, name, url, description.

Fulltext indices work well, joins are relatively easy, no data duplication. Adding a language feels cleaner than solution three above, as it doesn't require modifying an existing table, just adding a new one. If the site takes off then different languages can be split to different servers easily. And the language table can disappear as collation will now be part of the field definition when the table is created. (I think the same can be said of solution three, but the php|a article kept the language table in its schema - was that an oversight, or am I missing something?)

The dark side? Getting the default language requires two queries. Or would something like this (untested) SQL work?
SELECT ISNULL(de.name,en.name) FROM product_translation_de de,product_translation_en en WHERE de.product_id=123 AND en.product_id=123

Thoughts?

The 4th option works well for me, but I'd be interested to here arguments against it. Perhaps you use something different again?

Monday, March 30, 2009

6GB on 32-bit linux

Well, it all started when a friend said their SMT (statistical machine translation) system was ready to download and install. He then casually mentioned it is a bit of a memory hog, "4Gb minimum, 8Gb preferred".

Wow. I looked at my up-until-then-perfectly-adequate-some-might-even-say-overkill 2 gigabytes and felt like a salesman who had just been told he ought to upgrade his sensible car for a sports car in order to look more successful!

I have dual-channel, and two slots free. Another 4Gb was under 5000 yen at Dospara (Japanese only), where I had bought this computer, so I emailed them to confirm it would work. After a few emails, and a bit of research I found out:
* Windows 32-bit will only go up to 3Gb.
* 64-bit Windows, 64-bit Linux are both fine well beyond 8Gb.
* 32-bit linux will allow up to 4Gb per process, and can use more than 4Gb altogether.

(I wanted to stick with 32-bit linux, as Flash is critical to my work and has no 64-bit version.)

Dospara were rather cautious, saying they don't support linux, but I went for it. When I plugged in the extra 4Gb, the bios correctly recognized 6Gb. Then Ubuntu said I had 3Gb. But that was okay, as I'd been expecting it. I went to the package manager and selected the "linux-server" meta-package, then rebooted.

Drum roll please: "free" reports I have 6Gb available. I'm using 475 Mb, and have 5,745 Mb free. See, I told you I didn't need it. But this is city driving. You wait until I take this beast down the local Formula One track, otherwise known as Difficult AI Problems.

Oh, while I was there I also bought a 1Tb SATA drive. Yes, that is a "T". A whole terabyte in a little, diddy box. It was only 10,000 yen (you could get one for 7,880 yen if you go for Seagate, but a quick bit of research showed lots of unhappy people, so I went for Western Digital which seemed to be the reliability brand).

What do you mean: "I bet he doesn't need that much storage either" ? Just because my current 250Gb drive still has 143G of free space after 18 months, doesn't mean I suddenly won't need more capacity...

And you can bet that when the ladies hear I am part of the Terabyte Generation I'm going to be fighting them off with a stick. Oooh, yeah! I am so sexy.

Sunday, March 8, 2009

What is an ontology?

I went to InterOntology 09, at Keio University a week ago. Actually I only attended the first couple of talks and the (free!) banquet; I couldn't make the other talks. Ontology is one of those words I have had trouble grasping, and I attended with no more ambition that understanding what it means.

My ambition was not fulfilled, but I was relieved to know that most attendees were just as confused as me. Okay, relieved is perhaps the wrong word. And by "confused", I don't mean people stood around scratching their heads. I mean people were using it in different ways. Most people who said they were "building ontologies" were actually building databases.

Someone I met there kindly sent me this link:
What are the differences between a vocabulary, a taxonomy, a thesaurus, an ontology, and a meta-model?

This (in conjunction with its first comment) is an excellent article. A bit heavy, but definitely worth the effort.

I feel it backs up the opinion I've been forming that ontology is a word that does not really need to exist. People "building ontologies" are generally data modeling, or taxonomy-building, or semantic-network-building. These terms are more specific, and contain more information about exactly what you are doing.

People using the word ontology may want to emphasize that the grammar allows for validating models using logic. Personally I would rather call that validating the data model, though I do see how a widely used ontology representation language could encourage high quality data validation tools. But they are nowhere near that yet - using SPARQL (pronounced sparkle) appears to be more like writing in assembler than the SQL its name hints at.

On the other hand, if everyone listened to me there would have been no InterOntology conference, and I'd have missed out on a free dinner. Such things need to be taken into consideration. Perhaps I should add "Professional Ontologist" to my business card after all? ;-)

Tuesday, March 3, 2009

Specify the default, stupid!

I've been struggling all week to connect to a Microsoft SQL server. It was SQL Express. I'm using "ADODB.Connection" COM object from PHP.

(The reason to use that is you can specify the 3rd param as CP_UTF8 (without quotes) and then it will convert from UTF-8 to the UCS-2 that SQL Server works in, all behind the scenes.)

First, if it says it cannot connect due to there being no server: you have to enable TCP/IP connections in SQL Server. They are off by default.

That got us to the next error, which said "接続が正しくありません。" which translates as "The connection is incorrect." Not much to go on. I think this may be the same as "error 26", but I'm not 100% sure on that.

My code could connect fine to the production DB machine, running SQL workstation (which is the more expensive version I believe). And a client on another machine that could also connect to that production DB, could not connect to our SQL Express machine.

So, with all fingers pointing at the SQL Express server, we tried things and googled things and swore at things. Wrong barking tree!

I've given the answer in the subject. SQL Server listens on port 1433 by default. Our SQL server was listening on that port, so you would think no need to specify it. But this DSN fails:

DRIVER={SQL Server};
SERVER={127.0.0.1};UID={myuser};PWD={mypassword}; DATABASE={mydatabase}

whereas this one works:
DRIVER={SQL Server};
SERVER={127.0.0.1,1433};UID={myuser};PWD={mypassword}; DATABASE={mydatabase}

Only for SQL Express it seems. No need to specify the default port when connecting to SQL workstation.

Nice one Microsoft. Just got to work on quality control and documentation a bit more, oh, and consistency, and then they could probably go professional with this software business of theirs.

Wednesday, October 15, 2008

Microsoft SQL server and PHP

I'm a big fan of Pear::DB. (And, as an aside, I think the Pear decision to deprecate it was a bit strange.) But it didn't work for me for Microsoft SQL server (aka mssql), for one particular project (I'm fairly sure it has worked on other projects, so I think it was something specific to this environment). What did work was using the COM object.

COM objects from PHP are a bit of a black art, perhaps as most code examples seem to be in Visual Basic, which has a very different syntax. I found the following page useful, as it shows all three ways of running the same query for a mssql database:
http://www.webcheatsheet.com/PHP/connect_mssql_database.php

I think I also used the user comments in the online PHP manual.

One thing the above URL does not show is error reporting. Here is the code I developed, which also takes care of not treating 5701 and 5703 as errors (congrats to Microsoft on another poorly thought out API!):

//$db is the return from the new COM() call.
//...run some Open or Execute command on $db...
$errors=$db->Errors;
if($errors->Count==0)return ""; //It worked
$e='';
foreach($errors as $ix=>$error){
if($error->NativeError==5701 || $error->NativeError==5703)continue; //Information messages, not errors
$e.="Number={$error->Number}, Source={$error->Source}, SQLState={$error->SQLState}, NativeError={$error->NativeError}, Description={$error->Description}\n";
}
return $e;

Darren's Developer Diary