"GRE is too easy!
You are pretty good at finding one good referring expression.
Maybe you need to move on to something harder &mdash like finding
Feedback from Mary Gardiner at the CLT, Macquarie University
This little bit of feedback made me think that, indeed, perhaps we should take a new perspective at REG: maybe we need give up the fruitless search for the holy grail of "the one best" referring expression and set about the more realistic and possibly more interesting task of characterising all good referring expressions. The first question then is:
"What are the constraints that carve clearly "bad" descriptions off one end of the space of all computationally possible descriptions and clearly "good" ones from the other?"
As this is undeniably going to be heavily influenced by the type of domain we are talking about, the next questions are:
"What is it about this particular domain that makes certain referring expressions sound bad and others acceptable?"
"Can we find classes of domains that share characteristics making it easy to transfer constraints for finding acceptable referring expressions?"
Answering this questions could lead to a hierarchy of domains, similar to the diagram on the right.

To be able to conduct experiments involving human participants to elicit rankings of descriptions, I have implemented a simple realisation mechanism. It converts the edges of a descriptive graph (as in Krahmer et al. 2003) into natural language descriptions. I used the drawer domain from my initial experiment. I have so far limited the exploration to referring expressions involving no more than 3 drawers.

You can download all possible referring expressions involving less than four drawers produced with the graph-based algorithm for the drawer domain.

Based on this list of all computationally possible descriptions, I devised a set of preliminary constraints for classifying clearly "bad" and clearly "good" referring expressions in grid-like domains such as the drawer domain, containing uniform objects. These constraints are described in the paper

Jette Viethen and Robert Dale (2007). Capturing acceptable variation in distinguishing descriptions. Proceedings of the 11th European Workshop on Natural Natural Language Generation, 113-120 Schloß Dagstuhl, Germany. [ pdf | slides ].