Protolanguage reconstruction tool

This is a homepage for protolanguage reconstruction tool. To view freshly reconstructed tree of languages

Click here

This result is obtained through several steps of this program. Which are

Retrieving data from the internet
The data is being retrieved from the Wiktionary using web-scraping methods. Data consists of transcriptions of words in every language that is available. The word which are being found are predetermined as a list, namely Swadesh List, a famous list which was created by archaeolinguist Morris Swadesh for comparison of languages and reconstruction of protolanguages. Nevertheless, the list can be changed easily and the corresponding data would be the result of calculations over this new list.
or Retrieving data from language catalogue
The data is pregathered by linguists in a unified catalogue of languages with their corresponding Swadesh lists.
Calculating language distances
Once the data is collected, every language is being compared to each other (In total the n squared operations). The whole process of language comparison comprises of the comparisons of each individual word and then the results are summed up. For the word comparison there is being used Levenstein distance and Hungarian algorithm.

The result of each language comparison is stored as a number that indicates how far these languages are from each other, represented as a matrix as a whole.
Building language tree
From the language distance matrix using Neighbour-joining algorithm the language tree is being built.
Reconstructing protolanguages
In the language tree, that was created in the last step, for each parent node, there is being calculated possible protolanguage using the vocabularies of the predecessor languages. The algorithms used for constructing a language are Levenstein distance and Dijkstra.