en
corpusMeaning:
tr
külliyat; kitaplık; ana kısım; kapital
One way to lower the number of errors in the Tatoeba Corpus would be to encourage people to only translate into their native languages instead of the other way around.
I would prefer to have a list of Italian words which aren't in the corpus.
The corpus is not structured as a table but as a graph.
The Tatoeba Corpus is not error-free. Due to the nature of a public collaborative project, this data will never be 100% free of errors.
Tom and Mary were on the verge of diving, off the left edge of the sentence, in the infinite corpus, when they spotted underneath a shoal of hungry contributors, teeth out, ready to jump on them and shred their mistakes down to the last one.
It's so easy to write good example sentences, that even if we accidentally delete a few good sentences in the process of getting rid of a whole lot of bad ones, I think we could drastically improve the quality of this corpus by doing a lot of deleting.
Please keep in mind that the proper tag to add to incorrect sentences is "@change" and not "change". Only with the right tag corpus maintainers will be informed.
Tatoeba's corpus is heterogeneous in many dimensions.
I must confess that my modest and worthless work here has included the translation (usually from English to Portuguese and Esperanto) of model sentences about Tom and Mary or about Sami and Layla. But I believe that it is worthwhile to also have sentences of different kinds added to Tatoeba's millionfold and exemplary corpus, even if they are less modern or less current.
It's dangerous to assume that all of the sentences in the Tatoeba Corpus are correct and suitable for language study.
Added on 2015-10-12 | by
m1gin |
View: 417