DFG-Project "Middle High German Grammar:" Section Bonn - Compilation of hyperlinks

© 2004 Frank Scheerer
Last update:
October 20th, 2008


Introduction

The separate steps, which have to be completed in order to digitalize a written text and refine it for further grammatical analysis, are listed below. Every menuitem offer the opportunity to see a minimized version of the handwriting in order to compare each stage of the digitalisation process with the handwriting.

Collating

The handwritten text is transferred into the computers memory letter by letter. In the transcription process, capitalization, abbreviation, short forms as well as superscripts , are well taken into account. Punctuation is transferred into the electronic version, too. Furthermore, during this stage a line counting is added which is used to identify every single word by an unique number.



Example of a text, that has not yet been pre-edited {Die Lilie, 10v,01-11r,02}:

{010v,01} *(D*)at is wider mu\ode van wereltliche\-
{010v,02} $chanden. ove van dode irerlifho
{010v,03} uede. oue van ires $elue$ $uchede\-.
{010v,04} *(D*)it ge$chit den dumben luden. di
{010v,05} denkent al$e $i $ich gode nekent.
{010v,06} dat $i en gein vuel beruren i\-$ule.
{010v,07} *(S*)i in gedenkent des gerehten ma\-
{010v,08} ne$ niet. *(J*)obpes. den got $elue
{010v,09} louede dat ime en gein men$che
{010v,10} in $inen ciden up ertriche were
{010v,11} gelich. inde gaf ime doch deme
{010v,12} uiende zebecorene an $ineme gu\o
{010v,13} de. an $inen kinden. inde an $ine
{010v,14} me ulei$che *(N*)iman en was al$e
{011r,01} gut al$e iob. niman in wart $o $ere
{011r,02} becort al$e iob.
Die Lilie [5 kB/ 80 kB]
Click here to magnify
(javascript)

Pre-Editing

In that stage of the digitalization process the text is enhanced by a mordern punctuation, which is encoded in a manner that makes it distinguishable from the handwritten text. In addition, word boundaries are checked and unified. Seperable prefix-verbs are encoded in a special way. Within each of this steps, the handwritten text is kept in its original form, that means it will not be normalized. This is done for the reason that during pre-editing phase, special encodings are used, which make them recognizeable as text-external at any time.



Example of a pre-edited text {Die Lilie, 10v,01-11r,02}:

{010v,01} *(D*)at is wider#mu\ode van wereltliche\-
{010v,02} $chanden. ove van dode irer|lifho(=)
{010v,03} uede. oue van ires $elue$ $uchede\-.(.)
{010v,04} *(D*)it ge$chit den dumben luden.,, di
{010v,05} denkent,, al$e $i $ich gode nekent.,,
{010v,06} dat $i en#gein vuel beruren i\-|$ule.(.)
{010v,07} *(S*)i in gedenkent des gerehten ma\-(=)
{010v,08} ne$ niet.,, *(J*)obpes.,, den got $elue
{010v,09} louede,, dat ime +E ime: {i< n>e} W @E en#gein men$che
{010v,10} in $inen ciden up ertriche were
{010v,11} gelich.,, inde gaf ime doch deme
{010v,12} uiende ze|becorene an $ineme gu\o(=)
{010v,13} de., an $inen kinden. inde an $ine(=)
{010v,14} me ulei$che(.) *(N*)iman en was al$e
{011r,01} gut al$e iob.,, niman in wart $o $ere
{011r,02} becort al$e iob.(.)
Die Lilie [5 kB/ 80 kB]
Click here to magnify
(javascript)

Indexing

Each word is analyzed grammatically and is linked to a lemma originating from the "Mittelhochdeutsches Wörterbuch" (Middle High German dictionary). Basing on that link, a so called normalized wordform is generated, which can be compared with the written wordform. The resulting index can be further analyzed with respect to single phenomena such as transliteration, word formation, morphology etc.



Example of an indexed text {Die Lilie, 10v,01-11r,02}:

¨dër
¨sîn
¨wider-müète
¨von
¨wër(e)lt-lich
¨schande
¨oder
¨von
¨tôd
¨ir(e)
¨lîb-habede
¨oder
¨von
¨ir(e)
¨sëlb
¨siuchede
¨dise
¨ge-schëhen
¨dër
¨tumb
¨liut
¨dër
¨dènken
¨al-sô
¨ër
¨sich
¨got
¨næhen
¨daz
¨ër
¨nehèin
¨übel
¨be-rüèren
¨ne
¨sol(e)n
¨ër
¨ne
¨ge-dènken
¨dër
¨ge-rëht
¨mann
¨niht
¨Job
¨dër
¨got
¨sëlb
¨loben
¨daz
¨ër
¨nehèin
¨mèn(ni)sche
¨in
¨sîn
¨zît
¨ûf
¨ërd-rîche
¨wësen
¨ge-lîch
¨unde
¨gëben
¨ër
¨doch
¨dër
¨vîand
¨ze
¨be-kor(e)n
¨ane
¨sîn
¨guot
¨ane
¨sîn
¨kind
¨unde
¨ane
¨sîn
¨vlèisch
¨niè-mann
¨ne
¨wësen
¨al-sô
¨guot
¨al-sô
¨Job
¨niè-mann
¨ne
¨wërden
¨sô
¨sêre
¨be-kor(e)n
¨al-sô
¨Job
#pron
#anv
#n
#präp
#adj
#f
#konj
#präp
#m<(u)>
#pron poss
#f
#konj
#präp
#pron poss
#pron
#f
#pron
#stv5
#art
#adj
#mn
#pron
#swv-a
#konj
#pron
#pron
#m
#swv
#konj
#pron
#pron
#n
#swv
#partikel
#anv
#pron
#partikel
#swv-a
#art
#adj
#m
#adv
#EN
#pron
#m
#pron
#swv
#konj
#pron
#pron
#m
#präp
#pron poss
#fn
#präp
#n
#stv5
#adj
#konj
#stv5
#pron
#adv
#art
#m
#präp
#swv
#präp
#pron poss
#n
#präp
#pron poss
#n
#konj
#präp
#pron poss
#n
#pron subst
#partikel
#stv5
#adv
#adj
#konj
#EN
#pron subst
#partikel
#stv3b
#adv
#adv
#swv
#konj
#EN
^NSn
^3SGI
^NS
^
^DP
^DP
^
^
^DS
^GSf
^GS
^
^
^GSm
^GSm
^DP
^NSn
^3SGI
^DP
^DP
^DP
^NP
^3PGI
^
^NP
^A
^DS
^3PGI
^
^AP
^0
^NS
^i
^
^3SGK
^NP
^
^3PGI
^GSm
^GSmw
^GS
^
^GS
^ASm
^NS
^NSmw
^3SV
^
^DSm
^0NSm
^NS
^
^DP
^DP
^
^DS
^3SVK
^-
^
^3SVI
^-/ASm
^
^DSm
^DS
^
^iD
^
^DSn
^DS
^
^DP
^DP
^
^
^DSn
^DS
^NS
^
^3SVI
^
^-
^
^NS
^NS
^
^3SVI
^
^
^pV
^
^NS
~daz
~is
~wider-müète
~von
~wër(e)lt-lîchen
~schanden
~obe
~von
~tôde
~ir(e)r
~lîb-habede
~obe
~von
~ir(e)s
~sëlbes
~siucheden
~dit
~ge-schiè{he}t
~dën
~tumben
~liuten
~di
~dènkent
~al-se
~si
~sich
~gote
~næhent
~daz
~si
~engèin
~übel
~be-rüèren
~en
~sul(e)
~si
~en
~ge-dènkent
~dës
~ge-rëhten
~mannes
~nièt
~jobes
~dën
~got
~sëlbe
~lobete
~daz
~ime
~engèin
~mènsche
~in
~sînen
~zîten
~ûf
~ërd-rîche
~wære
~ge-lîch
~inde
~gab
~ime
~doch
~dëme
~vîande
~ze
~be-kor(e)ne
~an
~sîneme
~guote
~an
~sînen
~kinden
~inde
~an
~sîneme
~vlèische
~niè-man
~en
~was
~al-se
~guot
~al-se
~Job
~niè-man
~en
~wart
~sô
~sêre
~be-kort
~al-se
~Job
\*(D*)at
\is
\wider#mu\ode
\van
\wereltliche\-
\$chanden.
\ove
\van
\dode
\irer|
\|lifho(=)uede.
\oue
\van
\ires
\$elue$
\$uchede\-.(.)
\*(D*)it
\ge$chit
\den
\dumben
\luden.,,
\di
\denkent,,
\al$e
\$i
\$ich
\gode
\nekent.,,
\dat
\$i
\en#gein
\vuel
\beruren
\i\-|
\|$ule.(.)
\*(S*)i
\in
\gedenkent
\des
\gerehten
\ma\-(=)ne$
\niet.,,
\*(J*)obpes.,,
\den
\got
\$elue
\louede,,
\dat
\ime
\en#gein
\men$che
\in
\$inen
\ciden
\up
\ertriche
\were
\gelich.,,
\inde
\gaf
\ime
\doch
\deme
\uiende
\ze|
\|becorene
\an
\$ineme
\gu\o(=)de.,
\an
\$inen
\kinden.
\inde
\an
\$ine(=)me
\ulei$che(.)
\*(N*)iman
\en
\was
\al$e
\gut
\al$e
\iob.,,
\niman
\in
\wart
\$o
\$ere
\becort
\al$e
\iob.(.)
03,32.03
03,32.04
03,32.05
03,32.06
03,32.07
03,32.08
03,32.09@
03,33.01
03,33.02
03,33.03
03,33.04
03,33.05
03,33.06
03,33.07
03,33.08
03,33.09
03,33.10
03,33.11
03,33.12
03,33.13@
03,34.01
03,34.02
03,34.03
03,34.04
03,34.05
03,34.06
03,34.07
03,34.08
03,34.09
03,34.10
03,34.11
03,34.12
03,34.13
03,34.14
03,34.15@
03,35.01
03,35.02
03,35.03
03,35.04
03,35.05
03,35.06
03,35.07
03,35.08
03,35.09
03,35.10
03,35.11
03,35.12
03,35.13@
04,01.01
04,01.02
04,01.03
04,01.04
04,01.05
04,01.06
04,01.07
04,01.08
04,01.09
04,01.10
04,01.11
04,01.12
04,01.13
04,01.14@
04,02.01
04,02.02
04,02.03
04,02.04
04,02.05
04,02.06
04,02.07
04,02.08
04,02.09
04,02.10
04,02.11
04,02.12
04,02.13@
04,03.01
04,03.02
04,03.03
04,03.04
04,03.05
04,03.06
04,03.07
04,03.08
04,03.09
04,03.10
04,03.11
04,03.12
04,03.13
04,03.14@
04,04.01
04,04.02
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :\[!]
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
Die Lilie [5 kB/ 80 kB]
Click here,
to magnify
(javascript)