Our previous experience with SMT into Turkish
(Durgar El-Kahlout and Oflazer, 2006) hinted that
exploiting sub-lexical structure would be a fruitful
avenue to pursue. This was based on the observation
that a Turkish word would have to align with a complete
phrase on the English side, and that sometimes
these phrases on the English side could be discontinuous.
Figure 1 shows a pair of English and Turkish
sentences that are aligned at the word (top) and morpheme
(bottom) levels. At the morpheme level, we
have split the Turkish words into their lexical morphemes
while English words with overt morphemes
have been stemmed, and such morphemes have been
marked with a tag.
The productive morphology