Effectively, it will be hard to normalize this. But, it's important to normalize a little and the terminology of CID can be a good start point. Indeed, if we normalyze, we could do for example this type of interrogation : find the term "in nomine Dei" in the protocole. If we don't normalize the content of our attribute, we couldn't do this interrogation. Gautier
3. Georg Vogeler wrote: a) << Maybe Michal Gervers or <<Michael Margolin could give us an example where a alteration of the <<CID definition of diplomatic parts might be necessary? The main reason of suggesting generic approach to the diplomatic encoding is that it is simply not possible to predict all content variations of the real word documents (for example we currently distinguish between "standard" and "specific" parts of the <protocol> child elements, etc. It is also very difficult to predict the depth of the encoding appropriate for the given group of charters (for example there may be more than two level of diplomatic parts subdivisions). b) << I'm working here <<with an Perl-class on the possibilities of such a search enginge - <<hoping to present you some results at the end of the term). That <<would mean that we have to stuck to the terminology of the CID as <<long as it gives us an appropriate term.
I see technically no problem from the point of view of the information retrieval in the case when element name stored in the "attributes" rather then in its name. The industrial search engine and database search facilities allow to index XML elements and to use standard SQL queries to based on the attribute values.
Michael Margolin, DEEDS Project, University of Toronto