31 July 2023

Statistical Restoration Greek New Testament

After many idle months here is a new blog entry on the availability of the Statistical Restoration Greek New Testament (StatResGNT) in the bibref tool. (A Linux version is available at Snapcraft.)

StatResGNT is a result of more than 20 years of hard research work by Alan Bunning, a theologist and computer science expert. It can be studied directly on the web site of the Center for New Testament Restoration, or in many Bible software that based on the Sword library.

In today's entry we get a closer how StatResGNT can be studied with the online version of bibref.


The command help informs the user on the available Bible translations. In this example, you can simply click on the word “help” and it will be copied and executed in the embedded window above. Of course, this can be typed manually as well.

Consider a difficult passage in John 7:37-38. In KJV, this reads: “In the last day, that great day of the feast, Jesus stood and cried, saying, If any man thirst, let him come unto me, and drink. He that believeth on me, as the scripture hath said, out of his belly shall flow rivers of living water.” (Use lookup KJV John 7:37 and lookup KJV John 7:38 to obtain this quickly.) Why is this difficult? It is hard to find which Old Testament passage is cited by Jesus.

Here we provide one possible answer. Maybe the text “as the scripture hath said” refers to the end of verse 37 and not verse 38. That is, the text “If any man thirst, let him come unto me, and drink” is mentioned.

We can compare the Greek texts in two versions. A well-known version is the Society of Biblical Literature's Greek New Testament (SBLGNT). To obtain the passage from this version we issue lookup SBLGNT John 7:37. Also, when using the StatResGNT version, we can try lookup StatResGNT John 7:37. Both gives the result “Εαν τις διψα ερχεσθω προς με και πινετω.“ This is comforting!

On the other hand, StatResGNT contains references to a word database, namely, to the Strong Concordance (or, more strictly speaking, to a modern variant of it). This means that the words are present also in a tokenized form. Use the input tokens StatResGNT John 7:37 to get the output 1722 1161 3588 2078 2250 3588 3173 3588 1859 2476 3588 2424 2532 2896 3004 1437 5100 1372 2064 4314 1473 2532 4095. We can learn that the last 8 words were spoken by Jesus, so we are interested in the last 8 tokens, from 1437 to 4095. A nice overview of this verse can also be found at the collation page at greekcntr.org (also developed by Alan Bunning).

We cannot skip some manual work at the present state of our research project. So, by selecting some key words from the tokens 1437 5100 1372 2064 4314 1473 2532 4095, we find that 1372 (“thirst”), 2532 (“and”) and 4095 (“drink”) may be of a good choice. So we construct the query search LXX 1372 2532 4095 12 to look for a passage that consists of a maximal length of 12 words and all of these 3 tokens must be contained. This is the result:

Read 3 tokens, searching for an extension of max. 12 tokens.
Found in Ruth 2:9+19 2:9-4 (tpos=667-678)
Found in Ruth 2:9+20 2:9-3 (tpos=668-679)
Found in Ruth 2:9+21 2:9-2 (tpos=669-680)
Found in Ruth 2:9+22 2:9-1 (tpos=670-681)
Found in Ruth 2:9+23 2:9 (tpos=671-682)
Found in Isaiah 29:8+10 29:8-22 (tpos=10483-10494)
Found in Isaiah 29:8+11 29:8-21 (tpos=10484-10495)
Found in Isaiah 29:8+12 29:8-20 (tpos=10485-10496)
Found in Isaiah 29:8+13 29:8-19 (tpos=10486-10497)
Found in Isaiah 29:8+14 29:8-18 (tpos=10487-10498)
Found in Isaiah 29:8+15 29:8-17 (tpos=10488-10499)
Found in Isaiah 29:8+16 29:8-16 (tpos=10489-10500)
Found in Isaiah 29:8+17 29:8-15 (tpos=10490-10501)
Found in Isaiah 29:8+18 29:8-14 (tpos=10491-10502)
Found in Isaiah 29:8+19 29:8-13 (tpos=10492-10503)
Found in Isaiah 29:8+20 29:8-12 (tpos=10493-10504)
Found in Isaiah 29:8+21 29:8-11 (tpos=10494-10505)
Found in Isaiah 55:1+1 55:1-6 (tpos=21492-21503)

So, why we are searching for an extension of max. 12 tokens? Well, because there is no match on 3 tokens, and the shortest match on 4 tokens (Isaiah 29:8) is a false positive: “It shall even be as when an hungry man dreameth, and, behold, he eateth; but he awaketh, and his soul is empty: or as when a thirsty man dreameth, and, behold, he drinketh; but he awaketh, and, behold, he is faint, and his soul hath appetite: so shall the multitude of all the nations be, that fight against mount Zion.” The same with 8 tokens (Ruth 2:9): “Let thine eyes be on the field that they do reap, and go thou after them: have I not charged the young men that they shall not touch thee? and when thou art athirst, go unto the vessels, and drink of that which the young men have drawn.”

But, by allowing a length of 12 tokens, we find Isaiah 55:1: “Ho, every one that thirsteth, come ye to the waters, and he that hath no money; come ye, buy, and eat; yea, come, buy wine and milk without money and without price.” This seems to be a much better match, and it seems plausible that Jesus indeed referred to this concept told by Isaiah.

Some explanation of the output. The notation Isaiah 55:1+1 55:1-6 means that the match was found in the tokenized version of Isaiah 55:1, from token 1+1=2 until the last but 6 token. These are tokens 21492-21503 (12 tokens altogether) in Isaiah. A direct command that shows the tokenized verse is tokens LXX Isaiah 55:1. It outputs 3588 1372 4198 1909 5204 2532 3745 3361 2192 694 59 2532 4095 427 694 2532 5092 3631 2532, and indeed, the second token is 1372, the 6th one is 2532, and the 13th one is 4095.

In fact, the search command ignores the ordering of the input tokens. That is, the searched 3 tokens could be given in an arbitrary order. So we can say that we are searching for a tokenset in a sequence of tokens.

We need to remark that tokenization is a matter of taste and could be done differently as it is provided now. The current tokenization is based on the Strong numbers, being checked and updated by Alan Bunning. So we simply rely on his work when using a token based search.

Tokens seem to be very useful to find fuzzy matches. My current research interest is to find effective algorithms that can find matching tokensets in the Old Testament and the New Testament.


Entries on topic internal references in the Bible

  1. Web version of bibref (12 January 2022)
  2. Order in chaos (17 January 2022)
  3. Reproducibility and imperfection (20 January 2022)
  4. A student of Gamaliel's (23 January 2022)
  5. Non-literal matches in the Romans (26 January 2022)
  6. Literal matches: minimal uniquity and maximal extension (31 January 2022)
  7. Literal matches: the minunique and getrefs algorithms (1 February 2022)
  8. Non-literal matches: Jaccard distance (2 February 2022)
  9. Non-literal matches in the Romans: Part 2 (3 February 2022)
  10. A summary on the Romans (5 February 2022)
  11. The Psalms (6 February 2022)
  12. The Psalms: Part 2 (7 February 2022)
  13. A classification of structure diagrams (15 February 2022)
  14. Isaiah: Part 1 (19 February 2022)
  15. Isaiah: Part 2 (26 February 2022)
  16. Isaiah: Part 3 (2 March 2022)
  17. Isaiah: Part 4 (7 March 2022)
  18. Isaiah: Part 5 (15 March 2022)
  19. Isaiah: Part 6 (23 March 2022)
  20. Isaiah: Part 7 (30 March 2022)
  21. A summary (7 April 2022)
  22. On the Wuppertal Project, concerning Matthew (17 July 2022)
  23. Matthew, a summary (25 July 2022)
  24. Isaiah, a second summary (31 July 2022)
  25. Long false positives (23 August 2022)
  26. A general visualization (25 August 2022)
  27. Stephen's defense speech (19 September 2022)
  28. Statistical Restoration Greek New Testament (31 July 2023)
  29. Qt version of bibref (11 March 2024)

Zoltán Kovács
Linz School of Education
Johannes Kepler University
Altenberger Strasse 69
A-4040 Linz