Format. Appendix A will be useful to anyone wishing to use our free association norms to set up their own database. We use Omnis 5 for this purpose, but other databases will work too (e.g., FileMaster), and the main advantage of such databases is that they can be used as on- line search-and-sort engines for creating lists of words with particular attributes.
The fields appearing in Appendix A are separated by commas in text format so that the document can be opened in a variety of different programs and databases, e.g., it can be opened in a column format in StatView, Excel, and other database programs. The files are labeled Cue-Target Pairs followed by a letter designation indicating that cues beginning with the designated letters can be found in this field, e.g., "Cue Target Pairs.A-B" means that normed words beginning with the letters A or B and their responses can be found in this file. In this format, data for 5,019 normed words and their 72,176 responses can be found. For each file, 31 data fields are presented so that the total matrix size when pooled across beginning letters is 31 columns by 72,176 rows. There are potential data entries for 2,237,456 cells in this matrix. A file containing the entire matrix was not provided because we thought that it would be too large to open on some computer systems. Instead, we provide smaller files based on 8 letter groupings, i.e., A-B, C, D-F, G-K, L- O, P-R, S, T-Z. Grouped in this way, the files are approximately the same size and this procedure was followed for the other appendices as well.
Data. The first column or field in each file presents the normed words or Cues listed in alphabetical order, and the second field presents their responses or Targets. In this format, the cues and their responses (targets) are presented as pairs. We refer to these items as cue- target pairs because of how such items are selected for use in research in our area of memory. Targets are selected as words to be studied in memory experiments, and cues are used to prompt their recall. Given the wide variation in word properties, the norms are used for constructing lists of pairs that systematically vary in some properties while holding other properties constant.
As a result of incorporating the norms into a database program, our list construction processes have entered the computer age and it is now feasible to control certain word attributes while varying others with greater degrees of rigor than ever before. For example, by imposing search restrictions on the targets in the pool, such as reporting only words that occur 50 or more times per million, that have a concreteness rating of 4.8 or greater, and that have no more than 16 and no fewer than 8 associates, all words whose associates are connected to an average of 3 other associates in the set can be reported. Instead of selecting words on only a single attribute such as frequency, they can be selected on the basis of a multitude of attributes while simultaneously holding other attributes constant. This capability also holds for pairs of related words. Instead of selecting attribute levels for manipulation blindly, the distribution of values can be plotted and then cutoffs marking extreme values can be set with full knowledge of the form of the distribution, its mean and its variance. Moreover, instead of selecting items to be representative of some particular dimension of interest, items can be selected randomly with normative values used after data collection to develop prediction equations for various tasks. In short, there may be no end to the uses to which a database of this sort might be applied. Our experience has been that list construction processes take more rather than less time since we created the database, but the final product is far superior because the "noise" resulting from uncontrolled factors can be substantially reduced. With less noise in the lists, more subtle main effects can be detected with greater ease and shy but theoretically interesting interaction effects become more bold. In general, Appendix A can be used for selecting pairs of related words that have been produced by two or more subjects in free association, but by incorporating the information in Appendix A into a database program the materials can be manipulated and selected in much more sophisticated ways.
The remaining fields present information about the pairs or the individual words comprising them. The 3rd field, called NORMED?, indicates whether the target word in the pair has been normed by a separate group of participants. A Y stands for "yes" and indicates that the target has been normed and an N stands for "no" indicating that it has not been normed. Of the 72,176 responses or targets appearing in the database 8,557 have not been normed and therefore cells that depend on normative information for these items have been left blank. This means that data are provided for only 63,619 of the 72, 176 responses. These responses comprise the 5,019 normed words produced redundantly by different cues, e.g., 18 different words produce ABILITY as a response. The Normed? field is particularly important for researchers wishing to select pairs with known forward (cue-to-target) and backward (from target-to-cue) strengths. Those tempted to infer the strength of the backward connection from the strength of the forward connection should beware. The correlation between forward and backward strength for cues whose targets have been normed is positive but not high, r = .29 (n = 63,619), and the chances of correctly guessing back strengths from knowledge of forward strengths are low.
The 4th field is called #G which stands for the number of participants serving in the group norming the word, and the 5th field is called #P for the number of participants producing a particular response. The 6th field is called FSG which stands for forward strength or what has sometimes been called cue-to-target strength. This value is calculated in the traditional way by dividing #P by #G which gives the proportion of subjects in the group who produce a particular target in the presence of the cue word. For example, for the word ABILITY, 17 out of the 143 participants in the group produced CAPABILITY as a response, so FSG for this pair is calculated to be .119. From this value we assume that it is reasonable to infer that the probability of producing CAPABILITY in the presence of ABILITY in the absence of studying either of these words in an experimental context is approximately .119. Each of the files in Appendix A was sorted first on the beginning letter of the normed cue word, then by FSG from highest to lowest, and then, within FSG, alphabetically by the target.
The 7th field is called BSG which stands for backward strength or target-to-cue strength. The word "backward" here is apt to be confusing to some because BSG is measured in the same way as forward strength, except the word appearing as the "target" now serves as the "cue" to be normed instead of the reverse. The term backward simply follows the conventional but admittedly misleading use of the term in memory research. If it is important for some purpose to know #G and #P for the index of BSG, look up the word serving as the target in a given pair as a cue. For example, for CAPABILITY in the above pairing, 35 out of a group of 124 participants produced ABILITY as a response, so BSG in the ABILITY CAPABILITY pairing is calculated at 35/124 = .282.
The next 6 fields index indirect connections between the word pairs. FSG and BSG represent measures of direct strength because one word directly produces the other as an associate in free association. Indirect connections index links between related words that occur through other words. Such connections are often ignored in research applications of normative data but they can be very strong and can have large effects on memory performance in certain tasks (Nelson, Bennett, & Leibert, 1997; Nelson et al., 1998). The 8th field is named MSG for mediated strength which is also sometimes called 2-step strength in the memory literature. For example, ABILITY produces competence as an associate with a probability of .06 which in turn produces capability as an associate with a probability .08. The mediated strength of the ABILITY CAPABILITY pairing is calculated by cross multiplying the individual links and then summing the results across each link. Given that no other mediated links were detected for this pair MSG was calculated as .06 * .08 = .0048. This particular pair has one 2-step mediated link, but some word pairs have no such connections whereas others have as many 17. The highest calculated MSG in this database is .66 and it should be noted that indirect strength as indexed by this procedure sometimes exceeds direct strength.
The 9th field is named OSG for overlapping strength. Two words comprising a particular pair may also have associates in common, what have sometimes been called overlapping, convergent or shared associates. The cue word and the target word may produce some of the same words as associates. For example, both ABILITY and CAPABILITY produce the same 6 words as associates, including able, strength, talent, potential, capacity, and knowledge. The overlap strength for this pair is calculated as shown in Table 2. From this example, it should be clear that OSG is calculated like MSG in that the strengths of the individual connections are cross multiplied and then summed.
|Example for calculating OSG.|
|Cue to Overlapping |
|Target to Overlapping |
The next 9 fields provide information about the cue, information that is independent of its targets. Each field name contains the letter Q as a indication that the information presented is related to the cue or normed word. The 14th field provides a relative index of how many near neighbors the cue has, or what we generally call its cue set size, QSS. This index is calculated by counting the number of different responses or targets given by two or more participants in the normative sample. Some words have set sizes of 1.00 (e.g., LEFT) whereas others have set sizes of 30 or more different words (e.g., FARMER), and in general, set size closely approximates a normal distribution. The criterion of "two or more" participants was chosen many years ago on the assumption that idiosyncratic responses given by a single participant would tend to be "off the wall." The opinion was that such responses should not be counted as in the set because they would "vary with different walls" and would therefore be unreliable. However, after years of data collection it has become more clear that such responses make sense most of the time to an objective observer so most are not "off the wall" as the senior author once thought. They are however, unreliable because re-normings of hundreds of the same words showed that a completely different set of idiosyncratic responses were produced each time the words was normed (Nelson & Schreiber, 1992). Words given by two or more subjects tend to be highly reliable, as is the number of different words produced by the cue, regardless of whether they are given by two or more participants or by a single participant. What is different between normings are the specific idiosyncratic responses produced by a single participant.
We now interpret these findings to mean that most words are linked to very large numbers of other words, links that presumably are created as a result of experience with words in spoken conversation, reading and thinking. Discrete free association norms, we believe, provide a reliable index of the number of strongest associates, or nearest neighbors in the sense of semantic distance. Even a response that is provided by only 2 out of 150 participants is regarded as a relatively strong associate. However, because idiosyncratic responses seem to be unreliable members of the set, we concluded that words are connected strongly to some of their associates and are very weakly connected to many other associates, associates that are produced out of context rarely and with some inconsistency. The lesson we take from these considerations is that discrete free association provides a very good indicator of the number of strong associates and a very poor indicator of the number of weak associates. Hence, we conclude that QSS provides a relative index of the set size of a particular word by providing a reliable measure of how many strong associates it has. Because it fails as an indicator of the number of weak associates, this index should not be construed as providing an index of absolute set size.
The 15th field presents the printed frequency of the cue, QFR, and these values were borrowed from the Kucera and Francis (1967) norms for the convenience of readers. The 16th field shows a concreteness rating on a scale of 1-7 for many of the words in the norms, QCON. Many but not all of these values were borrowed. First, we looked up a given word in the Paivio, Yuille and Madigan (1968) norms, and if the word was located, then its concreteness was entered into our database. If the word was not located in these norms, we then looked the word up in the Toglia and Battig norms (1978) and used this value. Finally, if the word was not in either source, we sometimes normed it ourselves using procedures described by Paivio et al. (1968). In this way, concreteness values are provided for 3,260 words for the convenience of readers (non- normed words have been left blank).
The 17th field provides information on whether the cue is a homograph, QH. The information was also borrowed from other databases that separate the associates into two or more classes on the basis of different meanings. A blank space indicates that the cue word under consideration is probably not classified as a homograph, and a single letter indicates that it is a homograph or that it is likely to be one. The letters refer to the first letter of the first author associated with the homograph norms so that interested readers can pursue source if desired. This information is provided in Table 3, and it should be noted that, as with concreteness ratings, sources were used in a particular ordering. This ordering can be described by arranging the letters of the authors from first to last used: N, P, W, T, G and C. Other than selecting what was handy at the time, no particular rationale was used in determining this ordering but it does mean that some words will appear in more than one set of norms and this fact is not recognized here.
|Sources of homograph norms.|
|C||6||Cramer, P. (1970).|
|G||247||Not normed to our knowledge. |
Identified as likely homographs
by Nancy Gee
|N||297||Nelson et al., (1980).|
|P||33||Perfetti et al., (1971).|
|T||167||Twilley et al., (1994).|
|W||48||Wollen et al., (1980).|
The 18th field presents the part-of-speech classification of the cue word, QPS, which was determined by the first part of speech listing in The American Heritage Dictionary of the English Language (1980). Only a single entry is provided for each word, even when, for example a word can be classified as either a noun or a verb. Part of speech is indicated by the first letter or by two letters for each classification, and Table 4 provides the definitions.
|Definitions of parts of speech.|
|Abbreviation||Part of Speech|
The 19th field provides an index of the mean connectivity among the associates of the normed word, QMC. This measure is obtained by norming the associates of the cue word with separate groups of participants, counting the number of connections among the associates in the set, and then dividing by the size of the set (minus MIAS if there are any). This index captures the density and in some sense the level of organization among the strongest associates of the cue. The 20th field provides an index of a related measure, QPR, which measures the probability that each associate in the set produces the normed cue as an associate. The P stands for probability and the R stands for resonance to recognize the fact that, if a resonant connection exists between the normed word and one of its associates, then there must be a connection in both directions. In activation models activation can presumably resonate between the initiating stimulus and the back- connected associate. This index is calculated by simply counting the number of associates in the set than produce the cue word as an associate and then dividing by set size (minus MIAS if any). The 21st field provides a companion value called QRSG representing the resonance strength of the cue. This index is calculated by cross-multiplying cue-to-associate strength by associate- to-cue strength for each associate in the set and then summing the result. Table 5 illustrates this calculation for the cue ABILITY. The table includes only resonating associates because associates that do not produce the cue word do not contribute anything to the sum, i.e., they zero out.
|Example calculation of QRSG|
The 22nd field provides what we call a Use Code value for the cue, QUC. QUC values are 1's or 0's depending on whether there is an important associate that has not yet been normed. For many of the cues given UC's of 1, all of their associates have been normed, but some cues having non-normed associates with strengths equal to or less than .04, have also been assigned UC's of 1. These were items that, in the senior author's opinion, could be used in experimentation because the missing associates were unlikely to alter the estimates of connectivity and resonance in a significant way. QUC's assigned values of 0 indicate that many of the associates or that an important associate was not normed. Such items should not be selected for purposes of experimentation when the purpose of the study is to investigate the influence of variables linked to the associative organization of the network, such as connectivity and resonance. In general, we recommend using items with UC's assigned a value of 1.
The next 9 fields, fields 23-31, provide information about the target itself, information that is independent of its cue. This information is parallel to that described for cues, so parallel names were created by substituting the letter T for target in front each designated field. For example, TSS stands for Target Set Size and this index of how many strong associates there are for a given target is calculated in the same way as it was for the cue. Hence, these designations include TSS, TFR, TCON, and so on.
Quick Reference. Table 6 provides a quick reference guide for the abbreviations appearing at the head of each data field:
|Abbreviations of terms and their equivalencies in Appendix A.|
|TARGET||Response to Normed Word|
|NORMED?||Is Response Normed?|
|#P||Number of Participants Producing Response|
|FSG||Forward Cue-to-Target Strength|
|BSG||Backward Target-to-Cue Strength|
|OSG||Overlapping Associate Strength|
|#M||Number of Mediators|
|MMIA||Number of Non-Normed Potential |
|#O||Number of Overlaping Associates|
|OMIA||Number of Non-Normed Overlapping |
|QSS||Cue: Set Size|
|QH||Cue is a Homograph?|
|QPS||Cue: Part of Speech|
|QMC||Cue: Mean Connectivity Among Its |
|QPR||Cue: Probability of a Resonant Connection|
|QRSG||Cue: Resonant Strength|
|QUC||Cue: Use Code|
|TSS||Target: Set Size|
|TH||Target is a Homograph?|
|TPS||Target: Part of Speech|
|TMC||Target: Mean Connectivity Among Its |
|TPR||Target: Probability of a Resonant |
|TRSG||Target: Resonant Strength|
|TUC||Target: Use Code|