|
|
Introduction
The separate steps, which have to be completed in order to digitalize a written text and
refine it for further grammatical analysis, are listed below.
Every menuitem offer the opportunity to see a minimized version of the handwriting
in order to compare each stage of the digitalisation process with the handwriting.
Collating
The handwritten text is transferred into the computers memory letter by letter.
In the transcription process, capitalization, abbreviation, short forms as well as superscripts ,
are well taken into account. Punctuation is transferred into the electronic version, too.
Furthermore, during this stage a line counting is added which is used to identify every
single word by an unique number.
Example of a text, that has not yet been pre-edited {Die Lilie, 10v,01-11r,02}:
{010v,01} *(D*)at is wider mu\ode van wereltliche\-
{010v,02} $chanden. ove van dode irerlifho
{010v,03} uede. oue van ires $elue$ $uchede\-.
{010v,04} *(D*)it ge$chit den dumben luden. di
{010v,05} denkent al$e $i $ich gode nekent.
{010v,06} dat $i en gein vuel beruren i\-$ule.
{010v,07} *(S*)i in gedenkent des gerehten ma\-
{010v,08} ne$ niet. *(J*)obpes. den got $elue
{010v,09} louede dat ime en gein men$che
{010v,10} in $inen ciden up ertriche were
{010v,11} gelich. inde gaf ime doch deme
{010v,12} uiende zebecorene an $ineme gu\o
{010v,13} de. an $inen kinden. inde an $ine
{010v,14} me ulei$che *(N*)iman en was al$e
{011r,01} gut al$e iob. niman in wart $o $ere
{011r,02} becort al$e iob.
|
Click here to magnify (javascript)
|
Pre-Editing
In that stage of the digitalization process the text is enhanced by a mordern
punctuation, which is encoded in a manner that makes it distinguishable from
the handwritten text.
In addition, word boundaries are checked and unified. Seperable prefix-verbs
are encoded in a special way.
Within each of this steps, the handwritten text is kept in its original form, that means
it will not be normalized. This is done for the reason that during pre-editing phase, special encodings
are used, which make them recognizeable as text-external at any time.
Example of a pre-edited text {Die Lilie, 10v,01-11r,02}:
{010v,01} *(D*)at is wider#mu\ode van wereltliche\-
{010v,02} $chanden. ove van dode irer|lifho(=)
{010v,03} uede. oue van ires $elue$ $uchede\-.(.)
{010v,04} *(D*)it ge$chit den dumben luden.,, di
{010v,05} denkent,, al$e $i $ich gode nekent.,,
{010v,06} dat $i en#gein vuel beruren i\-|$ule.(.)
{010v,07} *(S*)i in gedenkent des gerehten ma\-(=)
{010v,08} ne$ niet.,, *(J*)obpes.,, den got $elue
{010v,09} louede,, dat ime +E ime: {i< n>e} W @E en#gein men$che
{010v,10} in $inen ciden up ertriche were
{010v,11} gelich.,, inde gaf ime doch deme
{010v,12} uiende ze|becorene an $ineme gu\o(=)
{010v,13} de., an $inen kinden. inde an $ine(=)
{010v,14} me ulei$che(.) *(N*)iman en was al$e
{011r,01} gut al$e iob.,, niman in wart $o $ere
{011r,02} becort al$e iob.(.)
|
Click here to magnify (javascript)
|
Indexing
Each word is analyzed grammatically and is linked to a lemma originating from
the "Mittelhochdeutsches Wörterbuch" (Middle High German dictionary).
Basing on that link, a so called normalized wordform is generated, which
can be compared with the written wordform. The resulting index can be further analyzed
with respect to single phenomena such as transliteration, word formation, morphology etc.
Example of an indexed text {Die Lilie, 10v,01-11r,02}:
¨dër
¨sîn
¨wider-müète
¨von
¨wër(e)lt-lich
¨schande
¨oder
¨von
¨tôd
¨ir(e)
¨lîb-habede
¨oder
¨von
¨ir(e)
¨sëlb
¨siuchede
¨dise
¨ge-schëhen
¨dër
¨tumb
¨liut
¨dër
¨dènken
¨al-sô
¨ër
¨sich
¨got
¨næhen
¨daz
¨ër
¨nehèin
¨übel
¨be-rüèren
¨ne
¨sol(e)n
¨ër
¨ne
¨ge-dènken
¨dër
¨ge-rëht
¨mann
¨niht
¨Job
¨dër
¨got
¨sëlb
¨loben
¨daz
¨ër
¨nehèin
¨mèn(ni)sche
¨in
¨sîn
¨zît
¨ûf
¨ërd-rîche
¨wësen
¨ge-lîch
¨unde
¨gëben
¨ër
¨doch
¨dër
¨vîand
¨ze
¨be-kor(e)n
¨ane
¨sîn
¨guot
¨ane
¨sîn
¨kind
¨unde
¨ane
¨sîn
¨vlèisch
¨niè-mann
¨ne
¨wësen
¨al-sô
¨guot
¨al-sô
¨Job
¨niè-mann
¨ne
¨wërden
¨sô
¨sêre
¨be-kor(e)n
¨al-sô
¨Job
|
#pron
#anv
#n
#präp
#adj
#f
#konj
#präp
#m<(u)>
#pron poss
#f
#konj
#präp
#pron poss
#pron
#f
#pron
#stv5
#art
#adj
#mn
#pron
#swv-a
#konj
#pron
#pron
#m
#swv
#konj
#pron
#pron
#n
#swv
#partikel
#anv
#pron
#partikel
#swv-a
#art
#adj
#m
#adv
#EN
#pron
#m
#pron
#swv
#konj
#pron
#pron
#m
#präp
#pron poss
#fn
#präp
#n
#stv5
#adj
#konj
#stv5
#pron
#adv
#art
#m
#präp
#swv
#präp
#pron poss
#n
#präp
#pron poss
#n
#konj
#präp
#pron poss
#n
#pron subst
#partikel
#stv5
#adv
#adj
#konj
#EN
#pron subst
#partikel
#stv3b
#adv
#adv
#swv
#konj
#EN
|
^NSn
^3SGI
^NS
^
^DP
^DP
^
^
^DS
^GSf
^GS
^
^
^GSm
^GSm
^DP
^NSn
^3SGI
^DP
^DP
^DP
^NP
^3PGI
^
^NP
^A
^DS
^3PGI
^
^AP
^0
^NS
^i
^
^3SGK
^NP
^
^3PGI
^GSm
^GSmw
^GS
^
^GS
^ASm
^NS
^NSmw
^3SV
^
^DSm
^0NSm
^NS
^
^DP
^DP
^
^DS
^3SVK
^-
^
^3SVI
^-/ASm
^
^DSm
^DS
^
^iD
^
^DSn
^DS
^
^DP
^DP
^
^
^DSn
^DS
^NS
^
^3SVI
^
^-
^
^NS
^NS
^
^3SVI
^
^
^pV
^
^NS
|
~daz
~is
~wider-müète
~von
~wër(e)lt-lîchen
~schanden
~obe
~von
~tôde
~ir(e)r
~lîb-habede
~obe
~von
~ir(e)s
~sëlbes
~siucheden
~dit
~ge-schiè{he}t
~dën
~tumben
~liuten
~di
~dènkent
~al-se
~si
~sich
~gote
~næhent
~daz
~si
~engèin
~übel
~be-rüèren
~en
~sul(e)
~si
~en
~ge-dènkent
~dës
~ge-rëhten
~mannes
~nièt
~jobes
~dën
~got
~sëlbe
~lobete
~daz
~ime
~engèin
~mènsche
~in
~sînen
~zîten
~ûf
~ërd-rîche
~wære
~ge-lîch
~inde
~gab
~ime
~doch
~dëme
~vîande
~ze
~be-kor(e)ne
~an
~sîneme
~guote
~an
~sînen
~kinden
~inde
~an
~sîneme
~vlèische
~niè-man
~en
~was
~al-se
~guot
~al-se
~Job
~niè-man
~en
~wart
~sô
~sêre
~be-kort
~al-se
~Job
|
\*(D*)at
\is
\wider#mu\ode
\van
\wereltliche\-
\$chanden.
\ove
\van
\dode
\irer|
\|lifho(=)uede.
\oue
\van
\ires
\$elue$
\$uchede\-.(.)
\*(D*)it
\ge$chit
\den
\dumben
\luden.,,
\di
\denkent,,
\al$e
\$i
\$ich
\gode
\nekent.,,
\dat
\$i
\en#gein
\vuel
\beruren
\i\-|
\|$ule.(.)
\*(S*)i
\in
\gedenkent
\des
\gerehten
\ma\-(=)ne$
\niet.,,
\*(J*)obpes.,,
\den
\got
\$elue
\louede,,
\dat
\ime
\en#gein
\men$che
\in
\$inen
\ciden
\up
\ertriche
\were
\gelich.,,
\inde
\gaf
\ime
\doch
\deme
\uiende
\ze|
\|becorene
\an
\$ineme
\gu\o(=)de.,
\an
\$inen
\kinden.
\inde
\an
\$ine(=)me
\ulei$che(.)
\*(N*)iman
\en
\was
\al$e
\gut
\al$e
\iob.,,
\niman
\in
\wart
\$o
\$ere
\becort
\al$e
\iob.(.)
|
03,32.03
03,32.04
03,32.05
03,32.06
03,32.07
03,32.08
03,32.09@
03,33.01
03,33.02
03,33.03
03,33.04
03,33.05
03,33.06
03,33.07
03,33.08
03,33.09
03,33.10
03,33.11
03,33.12
03,33.13@
03,34.01
03,34.02
03,34.03
03,34.04
03,34.05
03,34.06
03,34.07
03,34.08
03,34.09
03,34.10
03,34.11
03,34.12
03,34.13
03,34.14
03,34.15@
03,35.01
03,35.02
03,35.03
03,35.04
03,35.05
03,35.06
03,35.07
03,35.08
03,35.09
03,35.10
03,35.11
03,35.12
03,35.13@
04,01.01
04,01.02
04,01.03
04,01.04
04,01.05
04,01.06
04,01.07
04,01.08
04,01.09
04,01.10
04,01.11
04,01.12
04,01.13
04,01.14@
04,02.01
04,02.02
04,02.03
04,02.04
04,02.05
04,02.06
04,02.07
04,02.08
04,02.09
04,02.10
04,02.11
04,02.12
04,02.13@
04,03.01
04,03.02
04,03.03
04,03.04
04,03.05
04,03.06
04,03.07
04,03.08
04,03.09
04,03.10
04,03.11
04,03.12
04,03.13
04,03.14@
04,04.01
04,04.02
|
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :\[!]
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
11 :
|
Click here, to magnify (javascript)
|
|
|