Once the agent has collected the data, the next step is to organize and analyze it. In many cases, agents will be provided tables reflecting various statistics when using scan forms to collect data and if agents are doing their own evaluation it is their responsibility to develop meaningful tables/graphs/charts that will reflect the program’s impact.
The survey has historically been one of the most popular used evaluation tool utilized by Extension. A traditional approach to survey analysis involves frequency counts, and measures of central tendency (Santos & Clegg, 1999). Whether using scan form or agent generated evaluations are can be an overwhelming task to sort out the meaningful data to depict an accurate picture of the effectiveness of a program. In order to effectively understand the statistics that are generated from the data agents need to have a working understanding of some of the definitions associated with using descriptive statistics.
One of the frequently utilized statistics that is provided is frequency. A frequency is the number of times a data value occurs. When reporting frequency agents will usually construct a frequency table by arranging collected values in ascending order with their corresponding frequencies. An example of a frequency table would be reporting particular characteristics of the target audience. Table1 below provides an example of a frequency table that depicts the number of acres managed by producers participating in a Crop Management Short Course:
When reporting frequency for particular audiences agents may prefer to illustrate the frequency of the number of acres managed by producers utilizing a chart. Charts provide the reader with a snapshot of the data.
Figure 1 is an example that illustrates the number of acres managed by producers in a simple bar chart:
Another use of frequency tables would be in reporting adoption or intent to adopt a change in behavior, best practice, or adopt a new technology. Table 2 provides an example of how intent to adopt could be reported:
Agents may also prefer to report the intent to adopt utilizing a chart that provides a quick reference for the reader regarding the percent of participants who intend to adopt a best practicechange a behavior or adopt a technology. Figure 2 provides an illustration of the percent of participants who intend to adopt a best practice:
Some Descriptive statistics that are frequently reported by agents are statistics used to summarize or describe a set of observations. Descriptive statistics are frequently utilized to summarize the central tendency of a set of observations and the dispersion of the data set. Measures of central tendency are categories or scores that describe what is “average” or “typical” of a given distribution. These include the mode, median and mean.
The mode is the category with the greatest frequency (or percentage). It is not the frequency itself. It is possible to have more than one mode in a distribution. Such distributions are considered bimodal (if there are two modes) or multi-modal (if there are more than two modes). Distributions without a clear mode are said to be uniform. The mode is not particularly useful, but it is the only measure of central tendency that can be used with nominal variables. A nominal variable is another name for categorical variables. Nominal variables have two or more categories without having any kind of natural order. They are variables with no numeric value, such as the type of exercise participants are engaged in as result of the program. Another way of thinking about nominal variables is that they are named. Some examples of nominal variables are the occupation of participants, breeds of cattle produced, type of crops produced, name of fruits or vegetables consumed, type of exercise (walking, jogging, swimming, etc.), and demographic descriptions of participants.
The following pie chart (Figure 3) reflects the type of exercise for participants engaged in a healthy living educational program:
The median is the midpoint number. In other words, it’s the number that divides the distribution exactly in half so that half the cases are above the median, and half are below. It’s also known as the 50th percentile, and it can be calculated for ordinal and interval variables.
An ordinal variable is similar to a nominal or categorical variable. The difference between the ordinal and nominal variable is that there is a clear ordering for ordinal variables. For example, suppose you have a variable such as economic status, with three categories (low, medium and high). In addition to being able to classify people into these three categories, you can order the categories as low, medium and high.
Interval variables are a variable that falls on an interval scale. An example of an interval scale is those that have measurements where the differences between values are meaningful. The differences between points on the scale are measurable and exactly equal. An example of an interval scale is the difference in 110 bushels of corn per acres produced and 100 bushels of corn per acres produced is the same difference between 80 bushels of corn per acre produced and 90 bushels of corn produced per acre.
Conceptually, finding the median is fairly simple and entails only putting all of the observations in order from least to greatest and then finding whichever number falls in the middle. This is the reason median is not an appropriate measure of central tendency for nominal variables (such as types of crops produced), because nominal variables have no inherent order.
In some observations with an even number of cases, there will not be a middle number. If agent’s dataset has an even number of observations or cases, the median is the average of the two middlemost numbers. For example, for the numbers 18, 14, 12, 8, 6 and 4, the median is 10 (12 + 8 = 20; 20/2 = 10).
One of the median’s advantages is that it is not sensitive to outliers. An outlier is an observation that lies an abnormal distance from other values in a sample. Observations that are significantly larger or smaller than the others in a sample can impact some statistical measures in such a manner as to make them extremely deceptive, but the median is not vulnerable to outliers. In other words, it doesn’t matter if the acres managed by producers are 20 or 2000 acres; it still only counts as one number.
The mean is what is typically referred to as “the average”. It is the highest measure of central tendency because it is available for use only with interval variables. The mean takes into account the value of every observation and provides the most information of any measure of central tendency. Unlike the median, however, the mean is sensitive to outliers. In other words, one extraordinarily high (or low) value in an agent’s dataset can dramatically raise (or lower) the mean. The mean, often shown as an x variable with a line over it, is the sum of all the scores divided by the total number of scores.
Figure 4 provides a summary of characteristics of the most frequently utilized measure of central tendency:
In addition to measuring central tendency, agents may also need to report the amount of variability there is in their distribution of data. One of the commonly reported measures of variability is standard deviation (SD) and can be reported in tables with means or other measures of central tendencies. Standard deviation (SD) is a measure that is utilized to quantify the amount of variation or dispersion of a set of data values. A low standard deviation indicates that the data points tend to be close to the mean of the set. Whereas a high standard deviation indicates that the data points are more spread out or dispersed over a wider range of values.
An example of how to effectively report mean data and standard deviation in a pre and posttest survey is provided in table 3:
A chart that would communicate the before and after mean for knowledge gained for various topics covered during a Crop Management Short Course may be appropriate for some audiences. Figure 5 provides an example of how a before and after mean can be depicted with a chart:
Another element that is utilized in reporting outcomes when utilizing a pretest/post-test or retrospective post-test evaluations is percent change. The formula for calculating the percentage change is as follows:
((Mean after – Mean before) ÷ Mean before) ×100
Table 4 provides an example of how to report percent change for the level of knowledge:
If agents don’t utilize scan forms to collect evaluation data they will have the ability to design and implement the evaluation to best meet their needs, and Organization Development and Regional Program Leaders are available to advise and assist in every step of the process. In analyzing the data in these situations agents may want to utilize the On-Line Statistic Calculator. This free online calculator will allow agents to enter up to 5000 values and it will calculate the minimum value, maximum value, mean, median, mode, and standard deviation. The following provides a step by step instruction of how to utilize the On-Line Statistic Calculator:
In future Next Step to Success blog, we will discuss developing charts and reports to assist in telling our story to key stakeholders.
Boleman, C., Cummings, S. & Pope. P. (2005). Keys to education that works: Texas Cooperative Extension’s program development model, Texas Cooperative Extension, College Station, Texas. Publication #345.
Santos, J. R. A. (1999). Cronbach’s alpha: A tool for assessing the reliability of scales. Journal of Extension [On-line], 37(2) Article 2TOT3. Available at: http://www.joe.org/joe/1999april/tt3.php