Thursday, January 17, 2008

Friday, January 11, 2008

Artificial intelligence

Foreword

There is no dearth of natural language processing and artificial intelligence blogs.



Some promote their companies. Others , mostly students put their new found wisdom on record.Yet others report on their views or report conference proceedings.



My boss once advised me 'sail with the wind'.I said 'Happily , provided it takes me where I want to go.'.So if you know where you want to go, You ask 'o.k , How do I get there?'.


In software or for that matter in any engineering effort one wants to cut out 'reinventing the wheel' as far as possible. But then if the existing wheels don't do what you want , You have little choice. Either you say 'can't do' it or you say 'Let's take the bull by the horns.'.Personally I would do the latter.


We are bringing out in the posts our experiences with various freely available software.


A part of what the software will do is in publications-the intent mostly. Not much is available on 'How well it does in practice'. It's the latter part that is the focus of the various posts.


We hope the blog will be found useful by others who are recent entrants.


stanford-parser-2005-07-21

This is a statistical parser and freely available for playing around with and generally trying to figure out if there is anything useful you could do with it.


We downloaded it a couple of years back and at that point of time it said they preferred a more recent version of java than we had. We downloaded the latest version of java. And checked it out.


On the positive side the lexparser worked fine and we could see a nice parse tree diagram of the test sentence and the other sentences which we tried out. Then we tried using the command line with an input file containing a few sentences and redirecting the TreeBank style parse tree to another file.



It permits just one sentence on the input file. Try adding more and it complained of lack of heap space.


To be fair I must reproduce what it says in the documentation

" To run the PCFG parser on sentences of up to 40 words you need 100 Mb of memory. To be able to handle longer sentences, you need more (to parse sentences up to 100 words, you need 400 Mb). "


We recently tried it out on once again on a machine

It does a lowly 121 words at 2.08 seconds/word. i.e 121 words in about 4.2 min



Now let's take a look at some claims on the state of the art
"In particular, in our Java implementationon a 3GHz processor, it is possible to parse 1600 sentences in less than 900 sec. with an F1 of 91.2%.This compares favorably to the previously best generativelexicalized parser for English (Charniak & Johnson (2005):90.7% in 1300 sec.)."

in Learning and Inference for Hierarchically Split PCFGsby Slav Petrov and Dan Klein

can't vouch for it. But it's wow!