23237406? ago

This submission was linked from this anonymous v/QRV comment.

Posted automatically (#102042) by the SearchVoat.co Cross-Link Bot. You can suppress these notifications by appending a forward-slash(/) to your Voat link. More information here.

23236972? ago

Part 6 >

A.3.3 The Sample B3.

Chapter 10 of Genesis contains the names of the seventy descendents of Noah's sons: the Semites, the Hamites and the Japhetites (see Table 8). Jewish tradition teaches, that these seventy descendents became the Seventy Nations, which constitute Humanity. The source for this tradition is found in Targum Yonathan (Jonathan's Translation) to Deuteronomy 32:8 (see Kasher, 1983, p.294). This concept is well known, and is found in biblical Encyclopedias under the title "the Table of Nations" (see for instance, Encyclopedia Biblica, 1962).

Jewish tradition further tells us (Hagra, 1905), that a nation has the following four characteristics:

its name,
its country,
its language.
its script.

Sample B3 is constructed on the basis of the "Table of Nations" using these four categories. Let us denote

TN:= the Table of Nations

For every x TN we define below a set List B3(x) of expressions.

For every x,y TN we define

Pair B3(x,y) := {(w,y) | w List B3(x) and w has from 5 to 8 letters }.

The resulting sample is:

Sample B3:= Pair B3 (x,x).

Now let us explain how List B3(x) is constructed. Take, for example, one of the names in the list-- (Canaan - item 18 in Table 8):

To express the fact that  is a name of a nation, we use the (Biblical) expressions   (the Nation of Canaan) and  (the Canaanites).
For the Country of Canaan there is the Biblical expression: .
For the Language of Canaan there is the Biblical expression: .
For the Script of Canaan we do not have a Biblical expression like  , but it is a Biblical formation.

In the course of history names of nations and countries changed. In certain cases, the Jewish tradition, as expressed by Targum Yonathan to Genesis , gives the new names for the nations and countries under consideration. We add these new names to the corresponding categories (1) and (2). Thus, for example, the new name for Javan (item 4 in Table 8) is (Macedonia), whereas for Canaan there are no new names.

The same procedure is applied to each name from the Table of Nations. For every

x TN List B3 (x) consists of the following expressions:

'x ' (the nation of x), '-x' (the people of x), NewNationName(x);
'x ' (the country of x), NewCountryName(x);
'x ' (the language of x);
'x ' (the script of x).

In Table 8 for every name x TN the corresponding List B3(x) is given.

Remark 1. The expressions 'x ', 'x ' and 'x ' were chosen for categories (1), (2) and (3) because at least for some x TN they appear as Biblical expressions (see the above example of Canaan). For category (4) we used 'x '; although it does not appear for any x TN, it is the only Biblical formation suitable for this category.

Remark 2. Concerning the expressions '-x ', we are aware of the fact that only valid lingustic formations should be used; therefore we use these expressions if, at least, their singular form '-x' appears in the Hebrew Bible. For instance, for (item 1) we do not take , since the expression does not appear in the Hebrew Bible; so we cannot know whether it has any meaning at all. The check was done using Even-Shoshan's New Concordance of the Bible (Even-Shoshan, 1981).

Remark 3. In Talmudic Literature we find various new names for nations and countries, and we cannot decide in favor of one against the other. We preferred to use a single source. The Aramaic translation of Genesis that gives the largest number of such new names is Targum Yonathan. We used Targum Yonathan as printed in Torah Shelemah (Kasher, 1929) (with Torah Shelemah's corrections according to the Ginzburger manuscript and others).

Targum Yonathan identifies the new name for the country for items 1 to 15, 17, and 19 to 23 in Table 8; and the new name for the nation for items 24 to 33, and 41 to 44.

Targum Yonathan is a translation into Aramaic. We need the identification of the names, but we do not intend to check whether the Aramaic appears non-randomly as ELS's. Thus, where the Hebrew formation exists, we used it alone.

Remark 4. Here, as in Witztum et al. (1994), we used the grammatical orthogrophy - ktiv dikduki.

A.4 The Permutations.

Here we define the permuted samples for B1, B2 and B3.

A.4.1 Samples B1 and B2.

Let be a permutation on the set HA'. Then we define the permuted sample

Sample B1():= Pair B1(x,(x)).

The set HA' consists of 12 elements, thus there are 12! different permutations. To calculate significance levels, we chose 999,999 random permutations as described in Section A.7.

For Sample B2 we proceed similarly.

A.4.2 Sample B3.

For a permutation on the set TN we define the permuted sample

Sample B3() := Pair B3(x,(x)).

The set TN consists of 68 different elements (the names and appear twice), thus there are 68! different permutations. To calculate significance levels, we first chose 999,999 random permutations as described in Section A.6. Then we did a new measurement which 999,999,999 random permutations, only for statistic P2.

A.5 The Text.

We used the standard, generally accepted text of Genesis known as the "Textus Receptus." One widely available edition is that of the Koren Publishing Company in Jerusalem. The Koren text is precisely the same as that used by us.

A.6 The Overall Proximity Measures P1 and P2.

Let N be the number of word pairs (w, w') in the sample for which the corrected distance c(w,w') is defined (see Section A.2 and A.3 above). Let k be the number of such word pairs (w,w') for which c(w, w') 1/5. Define

To understand this definition, note that if the c(w, w') were independent random variables that are uniformly distributed over [0,1], then P1 would be the probability that at least k out of N of them are 0.2. However, we do not make or use any such assumptions about uniformity and independence. Thus P1, though calibrated in probability terms, is simply an ordinal index that measures the number of word pairs in a given sample whose words are "pretty close" to each other (i.e. c(w, w') 1/5), taking into account the size of the whole sample. It enables us to compare the overall proximity of the word pairs in different samples; specifically, in the samples arising from the different permutations .

The statistic P1 ignores all distances c(w,w') greater than 0.2, and gives equal weight to all distances less than 0.2. For a measure that is sensitive to the actual size of the distances, we calculate the product c(w,w') over all word pairs (w,w') in the sample. We then define

P2 := FN (c(w, w')),

with N as above, and

To understand this definition, note first that if x1, x2, ..., xN are independent random variables that are uniformly distributed over [O, 1], then the distribution of their product X :=x1, x2...xN is given by Prob(X X0) = FN(X0); this follows from (3.5) in Feller (1966), since the -ln xi are distributed exponentially, and -ln X = (-ln xi). The intuition for P2 is then analogous to that for P1: If the c(w, w') were independent random variables that are uniformly distributed over [0,1], then P2 would be the probability that the product c(w,w') is as small as it is, or smaller. But as before, we do not use any such uniformity, or independence assumptions. Like P1, the statistic P2 is calibrated in probability terms; but rather than thinking of it as a probability, one should think of it simply as an ordinal index that enables us to compare the proximity of the words in word pairs arising from different permutations.

A.7 The Randomizations.

The 999,999 random permutations for the three tests were chosen in accordance with Algorithm P of Knuth (Knuth: 1969, page 125). The pseudorandom generator required as input to this algorithm was that provided by Turbo-Pascal 5.0 of Borland Inter Inc. We used the same seed as in Witztum et al. (1994): 01001 10000 10011 11100 00101 00111 11.

The control text V was constructed by permuting the verses of G with a single random permutation, generated like in the previous paragraph. In this case, the seed was picked arbitrarily to be the decimal integer 10 (i.e., the binary integer 1010).

Acknowledgment.

We express our gratitude to Yaakov Rosenberg who prepared the software for the permutation test.

We thank Mr. Bernard Goldstein of London and Mr. Yaakov Aharonov of Bnei Brak for their help with computer facilities.

We express our sincere gratitude to Mishmereth Stam (Bnei Brak) for the text of the Book of Genesis on a disc.

We wish to express our thanks to Dr. Shalom Srebrenik for helpful discussions and valuable suggestions.

We thank Yaakov Orbach for help on linguistic matters.

REFERENCES

Cordovero, M. (1592) Pardes Rimmonim, Cracow.

Encyclopledia Biblica (1962), Jerusalem: Bialik Institute.

Even-Shoshan, A. (1981) A New Concordance of the Bible, Jerusalem: Kiriath Sefer.

Even-Shoshan, A. (1989) A New Dictionary of the Hebrew Language, Jerusalem: Kiriath Sefer.

Feller, W. (1966) An Introduction to Probability Theory and Its Applications, 2, New York: Wiley.

Hagra, (1905) A Commentary to the Book of Job, Jerusalem.

Kasher, M. M. (1929) Torah Shelemah. Talmud-Midrashic Encyclopedia on the Pentateuch, 1, Jerusalem: Beth Torah Shelemah.

Kasher, M. M. (1953) Torah Shelemah. Talmud-Midrashic Encyclopedia on the Pentateuch, 35, Jerusalem: Beth Torah Shelemah.

Knuth, D. E. (1969) The Art of Computer Programming, 2, Reading: Addison-Wesley.

Rabbenu Bahya, (1492) A Commentary on the Pentateuch, Naples.

Weissmandel, H. M. D. (1958) Torath Hemed. Mt. Kisco: Yeshivath Mt. Kisco.

Witztum, D., Rips. E. and Rosenberg, Y. (1994) Equidistant Letter Sequences in the Book of Genesis. Stat. Science, 9, No 3, 429 - 438.

23236945? ago

Part 5 >

A randomization test that fits for this type of samples, where the same word is "paired" with a list of expressions, is the subject of our next paper.

Sample B2 deals with women's names. We describe this fact exactly as we had done with men's names; i.e. by the following pairs:

(of women) -  (names),
(of the women) -  (names),
(for women)-  (names),
  (of women) -  (names),
  (of the women) -  (names).

Here too, the word (names) is in the above mentioned appearance as SL.

Table 5 gives the values of c(w,w') for each pair from the above list (where w' is (names), and w, is the corresponding expression).

To do a preliminary test for Sample B3, we proceed in a similar way and check the following expressions with the same appearance of (names) as above:

(of nations) -  (names),
(of the nations) -  (names),
(for nations) -  (names):
  (of nations) -  (names),
  (of the nations) -  (names).

We checked more specifically the subject of Sample B3: the title "The Seventy Nations" can be also written as (using the well-known numerical value of the Hebrew letters. = 70). The alternative title in use is ("The Seventy Descendants of Noah" ). Notice that in Hebrew means both the sons and the descendants of Noah.) This expression can be also written as , where again stands for 70. Thus we have the pairs:

  (the Seventy Nations) -  (names),
  (The Seventy Nations) -  (names),
  (The Seventy Descendants of Noah) -  (names),
  (The Seventy Descendants of Noah) -  (names).

Table 6 gives the values of c(w, w') for each pair from the above list (where w' is (names), and w is the corresponding expression). The total for the 9 pairs is: P1 = 0.0170, P2 = 0.0154.

Seeing these results as encouraging, we proceed to check even more closely the subject of Sample B3.

The expression (the descendants of Noah) exists in G as SL's five times. So we can look directly for meetings between this expression and the word (seventy) as ELS's:

(seventy) -  (the descendants of Noah)

In Fig. 3 we see the best meeting of these expressions. This table is determined by an ELS for the word (seventy) with skip 28. We mark this appearance of (the descendants of Noah), and check how close, each of the other expressions in the following pairs - appearing as a minimal ELS- "hits" it.

(the nations) -  (the descendants of Noah),
  (The Seventy Nations) -  (the descendants of Noah),
  (The Seventy Nations) - (the descendants of Noah),
  (The Seventy Descendants of Noah) -  (the descendants of Noah),
  (The Seventy Descendants of Noah) -  (the descendants of Noah),
(some of the descendants of Noah) -  (the descendants of Noah),
(his descendants) -  (the descendants of Noah).
  (the seventy descendants of him) -  (the descendants of Noah),
  (the seventy descendants of him) -  (the descendants of Noah),
(some of his descendants) -  (the descendants of Noah),
(his descendants) -  (the descendants of Noah),
  (the seventy descendants of him) -  (the descendants of Noah),
  (the seventy descendants of him) -  (the descendants of Noah),
(some of his descendants) -  (the descendants of Noah).

The expressions and also mean descendants: and (some of descendants of Noah), (some of his descendants) and (some of his descendants) express the fact that not all of his descendants became nations.

In Fig. 4 we see the pairs 1), 2) and 5) appearing together. This table is determined by an ELS for (The Seventy Descendants of Noah) with skip -100, each row containing 25 = 100/4 letters. The ELS for (seventy) is the same as shown in Fig. 3. The ELS's for (The Seventy Descendants of Noah) and for (the nations) are minimal over the whole text of G.

Table 7 gives the va1ues of c(w,w') for each pair from the above list (where w' is (the descendants of Noah), and w is the corresponding expression). The total for the 15 pairs is: P1 = 0.0000779, P2 = 0.0000377.

A.3.2 Sample B1.

"A Treasury of Men's Names" is included as an appendix in Even-Shoshan's Hebrew dictionary (Even-Shoshan, 1989). From this appendix we chose all the names which are mentioned in the Hebrew Bible (Pentateuch, Prophets and Writings) as proper names - as indicated in the Treasury itself - and contain 5 to 8 letters.

This collection of names is designated as S. Let HA := Hebrew Alphabet (which consists of 22 letters).

For every x HA we define

W(x) := {w S | w begins with the letter x}.

We define 22 sets A(x), x HA, where the elements of each set are the following three expressions:

'x'   (a name beginning with 'x')
'x'   (names beginning with 'x')
'x'   (names beginning with 'x')

The plural form of (=name) is written in Genesis either as , or as .

Then we define

W' (x) := {w' A(x) | w' appears in the text with d' = 1}.

Now we define

H A' := {x HA | W (x) and W' (x) }.

and, for every x,y HA'

Pair Bl(x,y) := {(w, w') | w W(x), w' W'(y)}.

For instance, ('' , ) Pair B1(,).

The sample is defined by:

Sample B1:= Pair Bl(x,x).

For Genesis it contains 457 pairs of expressions.

Sample B2 is defined precisely in the same way as Sample B1, except that we used 'A Treasury of Women's Names" from the same dictionary. For Genesis it contains 38 pairs of expressions.

See Part 6 >

23236935? ago

Part 4 >

A.2 The Corrected Distance.

In the previous section we defined a measure (w, w') of proximity between two words w and w' -- an inverse measure of the distance between them. We are, however, interested less in the absolute distance between two words, than in whether this distance is larger or smaller than "expected". In this section, we define a "relative distance" c(w, w'), which is small when w is "unusually close" to w', and is 1, or almost 1, when w is "unusually far" from w'.

The idea is to use perturbations of the arithmetic progressions that define the notion of an ELS. Specifically, start by fixing a triple (x,y,z) of integers in the range {-r,...,0,...,r}; there are (2r + 1)3 such triples. In Witztum et al. (1994) and also here we put r = 2. which gives us 125 triples. Next, rather than looking for ordinary ELS's (n,d,k), look for "(x,y,z)-perturbed ELS's" (n,d,k)(x,y,z) obtained by taking the positions

n, n + d,...,n + (k - 4)d, n + (k - 3)d + x, n + (k - 2)d + x + y, n + (k - 1)d + x + y + z,

instead of the positions n, n + d, n +2d,...,n +(k - 1)d. Note that in a word of length k, k-2 intervals could be perturbed. However, we preferred to perturb only the 3 last ones, for technical programming reasons.

The distance between the (x,y,z)-peturbed ELS (n, d, k)(x,y,z) and the SL (n', 1,k') is defined by the same formulae as in the non-perturbed case, where f is taken to be the distance between the first two letters of (x,y,z)-perturbed e.

We may now calculate the "(x,y,z)-proximity" of two words w and w' in a manner exactly analogous to that used for calculating the "ordinary" proximity (w, w'). This yields 125 numbers (x,y,z)(w, w'), of which (w,w')=(0,0,0)(w,w') is one. We are interested in only some of these 125 numbers; namely, those corresponding to triples (x,y,z) for which there actually exist some (x,y,z)-perturbed ELS's in Genesis for w (the other (x,y,z)(w,w') vanish). Denote by M(w, w') the set of all such triples, and by m(w, w') the number of its elements.

Suppose (0,0,0) is in M(w, w'), i.e., w actually appears as ordinary ELS (i.e., with x = y = z = 0) in the text. Denote by v(w,w') the number of triples (x,y,z) in M(w,w') for which (x,y,z)(w,w')(w,w'). If m(w,w') 10 (again, 10 is an arbitrarily selected "moderate" number),

c(w,w') :=v(w,w')/m(w,w').

If (0, 0,0) is not in M(w, w'), or if m(w, w') < 10 (in which case we consider the accuracy of the method as insufficient), we do not define c(w,w').

In words, the corrected distance c(w,w') is simply the rank order of the proximity (w,w') among all the "perturbed proximities" (x,y,z)(w,w'); if (w,w') is tied with other (x,y,z)(w,w'), half of these others are considered to "exceed" (w,w'). We normalize it so that the maximum distance is 1. A large corrected distance means that ELS's representing w are far away from the SL's representing w', on a scale determined by how far the perturbed ELS's for w are from the SL's for w'.

A.3 The Samples of Word Pairs.

In this Section we describe the three samples that were tested in this research. Each of them consists of pairs of expressions (w,w'), where according to our procedure, we are looking for w to appear as ELS's and for w' to appear as SL's.

Our method of rank ordering of ELS's based on (x,y,z)-perturbations requires that words have at least 5 letters to apply the perturbations. In addition, we found that for words with more than 8 letters, the number of (x,y,z)-perturbed ELS's which actually exist for such words was too small to satisfy our criteria for applying the corrected distance. Thus the words in our list are restricted in length to the range 5-8, exactly as in Witztum et al. (1994). However. there is no restriction on the words or expressions appearing as SL's.

A.3.1 Preliminary Tests.

Sample B1 deals with men's names. Appendix B in Even-Shoshan's Hebrew dictionary (Even-Shoshan. 1989) is "A Treasury of Private Names". It is divided into two parts: "A Treasury of Men's Names" and "A Treasury of Women's Names". Originally, we intented to check a sample based on the (much bigger) Treasury of men's names. The Treasury contains names from the Hebrew Bible and from various periods of the Hebrew language. We decided to include in our sample only names taken from the Hebrew Bible (they are indicated as such in this Treasury). These are original Hebrew names, or names which are etymologically Semitic or Hebrew. (See the foreword to the Treasury).

We describe the subject of the sample by the following set of pairs of expressions:

(private) -  (names),
(Semitic) -  (names),
(Hebrew) -  (names),
(original) -  (names).

The Hebrew term for "The Hebrew Bible" is either (Mikra) or (Tanach). We used both:

  (from the Hebrew Bible) -  (names),
  (from the Hebrew Bible) -  (names).

Then we describe the fact that we deal with men's names, by the following pairs:

(of men) -  (names),
(of the men) -  (names),
(for men) -  (names)
  (of men) -  (names),
  (of the men) -  (names).

The first pair is shown above in Fig. 1 (in the Introduction). There, a minimal ELS for the word (private) appears in close proximity with an appearance of the word (names) as a SL. We mark this appearance of (names), and check how close, each of the other expressions in the pairs 2) to 6) and a) to e) - appearing as a minimal ELS - "hits" at it. For example, this appearance of (names), is shown in Fig. 2: this table is determined by an ELS for (original), which is minimal in a section of the text comprising 86% of G.

Tables 4A and 4B give the values of c(w,w') for each pair from the above lists (where w' is (names), and w is the corresponding expression). Recall that c(w, w') is defined only when w appears as an ELS, and that w is restricted in length to the range 5-8.

The total for the 11 pairs is: P1 = 0.0000042, P2 = 0.000397 (for the definition of P1 and P2 see section A.6).

See Part 5 >

23236922? ago

Part 3 >

  1. Results and Conclusions. In Tables 1, 2 and 3 we present the results for the three samples. Table 1 shows the results for Sample B1. There we list the rank order of P1 and P2 among the 1,000,000 corresponding and . Thus the entry 1139 for P2 means that for 1138 out of the 999,999 random permutations , the statistic was smaller than P2. It follows that min = .000442, so = 2 min = .000884.

We conclude that for Sample B1 the null hypothesis is rejected with significance level (p-value) .000884.

The same calculations, using the same 999,999 random permutations, were performed for a control text V (see Witztum et al., 1994). The text V was obtained from G by permuting the verses of G randomly. (For details, see Appendix; Section A.7).

Table 1 gives the results of these calculations too. In the case of V, min is approximately .169, being non-significant.

Table 2 shows the results for Sample B2 for G. The results are non-significant. We saw no reason to perform any further tests for this sample.

Table 3 shows the results for Sample B3. In part A of it, the results for G, as well for the control text V for 999,999 random permutations are summarized. In the case of V, min is approximately .559.

In part B, the results for G for 999,999,999 random permutations are given. Notice that only the rank order of P2 was calculated. It turned out to be 4.

We conclude that for Sample B3 the null hypothesis is rejected with significance level (p-value) .000000004.

APPENDIX: Details of the Procedure.

In this Appendix we describe the procedure in sufficient detail to enable the reader to repeat the computations precisely. Some motivation for the various definitions is also provided.

In Section A.1 of this Appendix, a "raw" measure of distance between words is defined. Section A.2 explains how we normalize this raw measure to correct for factors like the length of a word and its composition (the relative frequency of the letters occurring in it). Section A.3 explains how the three samples B1, B2 and B3 are constructed. Section A.5 identifies the precise text, of Genesis that we used. In Section A.6, we define and motivate the statistics P1 and P2. The details of the task (iv) are described in Section A.4. Finally, Section A.7 provides the details of the randomization.

A.1 The Distance Between Words.

To define the "distance" between words, we must first define the distance between an ELS representing a word and a string of letters (SL) in the text, (i.e. with d = 1) representing the other word. Before we can do that, we must define the distance between ELS and SL in a given array: and before we can do that, we must define the distance between individual letters in the array.

As indicated in Section 1, we think of an array as one long line that spirals down on a cylinder; its row length h is the number of vertical columns. To define the distance between two letters x and x', cut the cylinder along a vertical line between two columns. In the resulting plane each of x and x' have two integer coordinates. and we compute the distance between them as usual, using these coordinates. In general, there are two possible values for this distance, depending on the vertical line that was chosen for cutting the cylinder; if the two values are different, we use the smaller one.

Next, we define the distance between fixed ELS e and SL e' in a fixed cylindrical array. Set

f := the distance between consecutive letters of e,

f ' := the distance between consecutive letters of e' = 1.

l := the minimal distance between a letter of e and one of e',

and define (e, e') := f2 + f'2+l2+1. We call (e, e') the distance between the ELS e and the SL e' in the given array; it is small if both fit into a relatively compact area.

Now there are many ways of writing Genesis as a cylindrical array, depending on the row length h. Denote by h(e, e') the distance (e, e') in the array determined by h, and set h(e, e') := 1/h(e, e'); the larger h(e, e') is, the more compact is the configuration consisting of e and e' in the array with row length h. Set e = (n, d, k) (recall that d is the skip). Of particular interest are the row lengths h = h1, h2,.... where hi is the integer nearest to |d| /i (1/2 is rounded up). Thus when h = h1 = |d|, then e appears as a column of adjacent letters and when h = h2, then e appears either as a column that skips alternate rows or as a straight line of knight's moves. In general, the arrays in which e appears relatively compactly are those with row length hi with i "not too large."

The above discussion indicates that if there is an array in which the configuration (e,e') is unusually compact, it is likely to be among those whose row length is one of the first ten hi. (Here and in the sequel 10 is an arbitrarily selected "moderate" number). So setting

we conclude that (e, e') is a reasonable measure of the maximal "compactness" of the configuration (e, e') in any array. Equivalently, it is an inverse measure of the minimum distance between e and e'.

Next, given a word w, we look for the most "noteworthy" occurrence or occurrences of w as an ELS in G. For this, we chose ELS's e = (n,d,k) with |d| 2 that spell out w for which |d| is minimal over all of G, or at least over large portions of it. Specifically, define the domain of minimality of e as the maximal segment Te of G that includes e and does not include any other ELS for w with ||< |d|. The length of Te, relative to the whole of G, is the "weight" we assign to e. Thus we define (e) := (Te)/(G), where (Te) is the length of Te, and (G) is the length of G. For any two words w and w', we set

where the sum is over all ELS's e spelling out w and over all SL's e' spelling out w'. Roughly, (w, w') measures the maximum closeness of the more noteworthy appearances of w as ELS's and w' as SL's in Genesis--the closer they are, the larger is (w, w').

When actually computing (w, w'), the size of the list of ELS's for w may be impractically large (especially for short words). It is clear from the definition of the domain of minimality that ELS's for w with relatively large skips will contribute very little to the value of (w,w') due to their small weight. Hence, in order to cut the amount of computation we restrict beforehand the range of the skip |d|D(w) for w so that the expected number of ELS's for w will be 10. This expected number equals the product of the relative frequencies (within Genesis) of the letters constituting w multiplied by the total number of all equidistant letter sequences with 2 |d| D. (The latter is given by the formula (D - 1)(2L - (k - 1)(D + 2)), where L is the length of the text and k is the number of letters in w). Abusing our notation somewhat, we continue to denote this modified function by (w,w').

See Part 4 >

23236871? ago

Part 2 >

  1. Outline of the Procedure. In this section we describe the test in outline. In an appendix, sufficient details are provided to enable the reader to repeat the computations precisely, and so to verify their correctness.

We test the significance of the phenomenon on samples of pairs of related words. To do this we must do the following:

define the notion of "distance" between any two words, so as to lend meaning to the idea of words in "close proximity";
define statistics that express how close, "on the whole," the words making up the sample pairs are to each other (some kind of average over the whole sample);
choose a sample of pairs of related words on which to run the test; and
determine whether the statistics defined in (ii) are "unusually small" for the chosen sample.

Notice that the procedure here described is identical to that used in (Witztum et al., 1994), except for minor changes in the details of task (i), due to fact that here we consider "distance" between an ELS and a word in the string of letters (SL) of the text, and not a "distance" between two ELS's (as in Witztum et al., 1994).

Task (i) has several components. First, we must define the notion of "distance" between an ELS and an SL of the text in a given array; for this we use a convenient variant of the ordinary Euclidean distance. Second, there are many ways of writing a text as a two-dimenional array, depending on the row length; we must select one or more of these arrays, and somehow amalgamate the results (of course, the selection and/or amalgamation must be carried out according to clearly stated, systematic rules). Third, a given word may occur many times as an ELS in a text; here again, a selection and amalgamation process is called for. Fourth, we must correct for factors such as word length and composition. All this is done in detail in Sections A.1 and A.2 of the Appendix.

Next, we have task (ii), measuring the overall proximity of pairs of words in the sample as a whole. For this, we used two different statistics, P1 and P2, which are defined and motivated in the Appendix (Section A.6). Intuitively, each measures overall proximity in a different way. In each case, a small value of Pi indicates that the words in the sample pairs are, on the whole, close to each other.

To accomplish task (iii) we composed three samples (Sample B1, Sample B2 and Sample B3) of pairs of expressions (w, w'), where w's are words appearing as ELS's, and w' 's are words appearing as SL's (i.e. with d' =1).

Preliminary test was done for each sample, in order to check how the subject of the sample appears as ELS's and SL's in Genesis, and consequently to decide whether to test the sample itself. For details see Appendix, Section A.3.1.

Sample B1 is built on the basis of the Hebrew alphabet. For every letter 'x' of the Hebrew alphabet we consider pairs (w,w'), where w' 's are found as SL's and have the meaning "a name beginning with 'x' " or "names beginning with 'x' ", while w's are names beginning with 'x' taken from "A Treasury of Men`s Names" (which is included as an appendix in Even-Shoshan's famous Hebrew dictionary (Even-Shoshan, 1989). For a detailed definition of the Sample B1 see Appendix (Section A.3.2).

Sample B2 is built exactly in the same way as Sample B1, except for the fact that the names are taken from "A Treasury of Women's Names" from the same dictionary.

Sample B3 is built on the basis of the list of the seventy descendents of Noah's sons: the Semites, the Hamites, and the Japhetites, found in Genesis Chapter 10. Jewish tradition teaches, that these seventy descendents became the Seventy Nations which constitute Humanity. This concept is well known, and is usually found in biblical Encyclopaedias under the title "The Table of Nations" (see for instance Encyclopedia Biblica, 1962). Sample B3 consists of pairs (w, w') where w' 's are names from this list, and w`s are expressions from a fixed set of expressions describing basic aspects of nationality (such as name, country, language etc.) For details see Appendix, Section A.3.3 and Table 8.

Finally, we come to Task (iv), the significant test itself. We apply the same procedure for all three samples. For Sample B3 we describe it here: for the other two samples the (similar) details are given in Appendix A.4.

The list of Seventy Nations consists of 68 different names (in two cases nations have the same name). For each of the 68! permutations of these names, we define the statistic obtained by permuting the names in accordance with , so that Name i is matched with the set of expressions defined for Name (i). The 68! numbers are ordered, with possible ties, according to the usual order of the real numbers. If the phenomenon under study were due to chance, it would be just as likely that P1 occupies any one of the 68! places in this order as any other. Similarly for P2. This is our null hypothesis.

To calculate significance levels, we chose 999,999 random permutations of the 68 names; the precise way in which this was done is explained in the Appendix (Section A.7). Each of these permutations determines a statistic together with P1, we have thus 1,000,000 numbers. Define the rank order of P1, among these 1,000,000 numbers as the number of not exceeding P1; if P1 is tied with other , half of these others are considered to "exceed" P1. Let be the rank order of P1, divided by 1,000,000; under the null hypothesis, is the probability that P1 would rank as low as it does. Define similarly (using the same 999,999 permutations in each case).

For Sample B3 we performed an additional test with 999,999,999 random permutations. In this case only statistic was calculated. The time needed for the computation of for 999,999,999 random permutations is at present not within the reach of our possibilties.

After calculating the probabilities and , we must make an overall decision, for each sample, to accept or reject the null hypothesis. Thus the overall significance level (or p-value) for each sample, using the two statistics. is := 2 min .

See Part 3 >