<div> or <seg> or ...
m.margolin at utoronto.ca
m.margolin at utoronto.ca
Di Nov 9 18:54:32 CET 2004
Hi everybody,
1. <div> vs. <seg>
My suggestion to use <seg> for diplomatic parts encoding is based solely
on the TEI definition stating that <seg> marks a text fragment and that exactly
what the diplomatic part is. On the other hand <div> is more generic by
definition and may contain some meta data along with the text.
I think that we should always seek the balance between specific and generic.
The prize of being to generic would be a performance penalty on the any kind of
information retrievals. On the other hand any attempt to enumerate content of
the data (for example to use precisely named element inside of the <tenor> can
lead to making encoding to restrictive and essentially not applicable.
Therefore I suggest to use <div type=document> instead of <document> ,
keep <tenor> because of its unambiguous meaning and use <seg> to encode any
diplomatic part on any level.
2. Cartulary and Document.
In my understanding the subject of XML encoding is a medieval charter
which might belong to one or more cartularies. From the implementation point of
view it would be unwise to encode multiple charters (cartulary) in the one text
file where some special elements (like <div>) would mark boundaries of the each
charter. Therefore each given charter encoding should the only include a
references to parent cartularies. The common approach to implementation of the
repository of charters is to create an independent database (or file system)
entry for the each charter.
3. Georg Vogeler wrote:
a) << Maybe Michal Gervers or
<<Michael Margolin could give us an example where a alteration of the
<<CID definition of diplomatic parts might be necessary?
The main reason of suggesting generic approach to the diplomatic encoding is
that it is simply not possible to predict all content variations of the real
word documents (for example we currently distinguish between "standard" and
"specific" parts of the <protocol> child elements, etc. It is also very
difficult to predict the depth of the encoding appropriate for the given group
of charters (for example there may be more than two level of diplomatic parts
subdivisions).
b)
<< I'm working here
<<with an Perl-class on the possibilities of such a search enginge -
<<hoping to present you some results at the end of the term). That
<<would mean that we have to stuck to the terminology of the CID as
<<long as it gives us an appropriate term.
I see technically no problem from the point of view of the information
retrieval in the case when element name stored in the "attributes" rather then
in its name. The industrial search engine and database search facilities allow
to index XML elements and to use standard SQL queries to based on the attribute
values.
Michael Margolin,
DEEDS Project,
University of Toronto
Mehr Informationen über die Mailingliste cei-l