<div> or <seg> or ...

Gautier Poupeau gpoupeau at enc.sorbonne.fr
Di Nov 9 19:45:34 CET 2004


Effectively, it will be hard to normalize this. But, it's important to 
normalize a little and the terminology of CID can be a good start point. 
Indeed, if we normalyze, we could do for example this type of 
interrogation : find the term "in nomine Dei" in the protocole. If we 
don't normalize the content of our attribute, we couldn't do this 
interrogation.

Gautier

>3. Georg Vogeler wrote:
>  a) << Maybe Michal Gervers or 
>   <<Michael Margolin could give us an example where a alteration of the
>    <<CID definition of diplomatic parts might be necessary?
>    The main reason of suggesting generic approach to the diplomatic encoding is
>that it is simply not possible to predict all content variations of the real
>word documents (for example we currently distinguish between "standard" and
>"specific" parts of the <protocol> child elements, etc. It is also very
>difficult to predict  the depth of the encoding appropriate for the given group
>of charters (for example there may be more than two level of diplomatic parts
>subdivisions).
>   b) 
><< I'm working here 
><<with an Perl-class on the possibilities of such a search enginge - 
><<hoping to present you some results at the end of the term). That 
><<would mean that we have to stuck to the terminology of the CID as 
><<long as it gives us an appropriate term.
>    
>    I see technically no problem from the point of view of the information
>retrieval in the case when element name stored in the "attributes" rather then
>in its name. The industrial search engine and database search facilities allow
>to index XML elements and to use standard SQL queries to based on the attribute
>values.
>
>Michael Margolin,
>DEEDS Project,
>University of Toronto
>
>
>
>  
>




Mehr Informationen über die Mailingliste cei-l