When I first had a look at the original message of Antti J. Ylikoski I assumed (perhaps unrealistically) that Natural language Understanding was meant since he said "I do not constrain...".
My messages should be viewed in this perspective. If this is not your interest you may skip the rest.
Peter Drucker's dictum or it's equivalent "Look before you leap" or "Take a top down view before getting bogged down in nitty gritty" is useful in view of time constraints for everybody.
Let me start by saying that like others I have gone through generations of Natural language processing tools conceptual and worked through code of others in addition to creating my own each time saying "this is it. Eureka!!", until I was dissuaded by a host of materials the starting point being John F. Sowa's The Challenge of Knowledge Soup" see http://www.jfsowa.com/pubs/. So today I like most others who have spent time recognise that the problems here are currently unsolved and likely to remain so for a long long time.Many actually beleive that an artificial mind is what is needed.
That said it is fascinating and a worthwhile Mind sport and in view of it's use I would be surprised if the biggies-IBMs,Microsoft etc are not at it.
The least we can do is to undestand the limitations of our current systems and How we might overcome them. To understand the limitations is to uncover the assumptions- often unstated that underlie the current systems. In a word make the implicit into explicit. This is more in the nature of a philosophers task and that is the reason why I said back to a philosophy class.
All the messages I happened to look at seemed to take for granted what I would call a language philosophy model=bag of words. and a reasoning philosophy model=predictate logic.
A little elaboration - Nobody wants to use NLP to do only pos tagging and parsing. One wants to use it in practical applications to do that the most common approach is the use of predicate logic or it's variants. So we really have Some assumptions about language. This set of assumptions about the language can be summed up by the term language philosophy model.we view language as a finite set of words and word collocations. words are related to other words words can be categorised allowable word sequences are governed by a grammar
An alternate view of language(language philosophy model=concepts) could be that it is a set of concepts not words. one difference that there is no polysemy here.Bank as in finacial institution is different concept from bank as in the bank of ganges(river) or a bank of radars.You would have different wordnet senses.But in a concept net they would be different concepts.
Similar remarks hold for reasoning philosophy modelIs there an alternative reasoning to predicates well yes. Take a look at http://web.media.mit.edu/~hugo/conceptnet/
He doesn't use predicate logic. I am not endorsing that work. All I am saying is don't shut your eyes to alternate models. Now try and figure out the relevance of NPHard in relation to conceptnet-No predicate logic simple node search. So before you categorise a problem know what the problem is and the terms in which you have described it.
Actually there exist a range of models (models in the sense of engineering/physical science models not tarski type model) depending on the philosophical beliefs embodied in
- language philosophy model
- reasoning philosophy model
Once you are through with this phase and taken position you will want to use the models to capture capture Laws/Rules or regularities of language and logic. The exact terms of reference in the law will depend on the two models as above. Every body knows what these are for the bag of words model. - You look for regularity in terms of Noun phrase preceding verb phrase and so on. your terms of refernce will be words,word order,phrases,phrase order..etc.
It is at this point that you might wonder wether to use a rule based search of the corpus or a statistical approach to capture the regularities. Or quite simply generate the rules using volunteers to find the regularities.
I would not venture into the number and quality of the models other than to say well they are there find them time permitting.
Another thing worth mentioning is Analog reasoning and the work of Douglas Hofstadter http://en.wikipedia.org/wiki/Douglas_Hofstadter. I dont particularly beleive in the utility of cognitive science but yes it is worth a dekko.
you may also look at http://formalsystemsphilosophy.blogspot.com/
Some Author specific messages
Wolf K
You are right when you say that "What makes natural languages impossible to formalise is metaphor". But Need we Formalize it in the sense of bottling it in a Formal Logic sytem which was originally designed for a different purpose-Viz the Axiomatization of mathematics."For any given set of words, a variety of collocations are syntactically.."You are absolutely right within what I am referring to as language philosophy model=bag of words. Change the Axioms( I am using the word very loosely for belief in a set of views) and you will arrive at a different conclusion and perhaps a new set of problems. As the above examples show there are alternatives to a statistical description.
Neil W Rickert
"And semantics does not easily map into rules of inference." Reason is that we are doing something we perhaps should not be doing . Semantics in a formal logic system would be defined in terms of the interpretation of the terms and compositionality where as I would rather define sentence as meaningful if it makes sense to a human. "Green Ideas sleep furiously" might not make sense to Noam Chmosky. But it does to me I interpret it as meaning Newly developed ideas lie there sleeping for a long while before being put to use. Poetry,Rhetoric and Metaphor cannot be modelled by formal logic systems.
"ontology is crap". I can understand your frustration. But categorisation thinking in terms of hierarchy are useful tools. I noticed that SUMO puts the same entity at two places in the tree. My Solution was to add a tree basis i.e to use a hierarchy of entities specified not just by entity but by another attribute = tree basis. e.g. powerset(the company) will appear in the tree of Microsoft Subsidaries (=basis) (and Sub Subsidaries). It can also appear in the tree of Industries (basis=Industry). You could add the time element also.
Ian Parker
"mathematics may be defined as the study of formal systems" This is one view and certainly not the current one. see for example http://en.wikipedia.org/wiki/Penelope_MaddyTranslation in general is part of NLU rather than NLP though you could approximate translations using statistical or other techniques.
The word "formal systems" is used by a majority to mean formal logic systems. I prefer to use it more broadly.
I am not too clear on How "it is possible to have a maximum entropy approach whereby grammar and meaning are both present in a "Hamiltonian"."Any way you seem to have missed the main point viz that we have to look beyond grammar and predicate logic- My be I have not communicated effectively.
Lotzi Boloni
Computer Language is a formal system - is not a definition of programming Language. It simply means that the Language can be modelled using Formal Logic which ensures that the programmer is assured of a correct output under all logical circumstances - by simulation of the language on a computer.Abstraction removes such things as font,format etc . You are right when you say that it does not fully model the real world.
Brian MartinI think I agree with you that Semantic modelling is more important and perhaps use Syntax only as an adjunct. Perhaps we could do direct semantic parsing.Semantic Role labelling uses parse as one feature but even that is not essential if an alternate set of features could be put in place.