**Detecting similarities and differences is the precursor to statistics**

We as pattern seekers, sense the patterns in objects mainly with our sense of sight and smell. Thus we can detect similarities and differences between patterns. Then we record these patterns in our memory. In statistics, we tend to group objects or classify them. We categorize things in our mind via our ability to describe similar and different.

Example: A friend invites you to a place to taste the fruits she has grown in her backyard. You both **look** at the apple and orange. You immediate notice different patterns of shape and colors. You smell the fruits, they also have different **smells** but both smells are refreshing.

Then she invites to taste the apple and the orange she has grown in her backyard. You find that they the apple **tastes sweeter** than the orange. The more you ate them the more you like them. She says “Here, you can take these small containers home”.

She then says that if you like the taste of the fruits I have grown in my backyard, as you eat the fruit, you can keep the seeds and plant them in your backyard. You also notice a pattern in the orange seed and the apple seed while looking closely at them: They are almost the **same** size.

Also note that the good times you shared with your friend while trying her fruits can help you better remember the patterns of the fruits. Nostalgia can sometimes motivate you to grow the seeds in your backyard.

**Comparing the apple and the orange**

**Similarities:**

- Both have smells.
- Opening them will reveal seeds.
- The seeds have the same size.
- They can be eaten.
- They have refreshing tastes.
- After a while, the seed can grow into a mature plant when planted in your backyard.

**Differences:**

- The apple is less round than the orange.
- The apple seed and orange seed both look different.
- Different smells when you open the fruits for eating.
- Different tastes: The apple taste more sweet but the orange tastes more sour.
- The orange is more juicy than the apple.

The differences between the apple and the orange make you think they are distinct, so you say: “They are different **kinds** of fruits”. You also notice they are both not similar to a house. So we have started categorizing different subtypes of fruits based on their differences.

Putting aside the differences and focusing on the similarities, the dictionary defines based on the five senses: *“The sweet and fleshy product of a tree or other plant that contains seed and can be eaten as food.”*

**As you grow up, you get to recognize more fruits**

The similarities of the fruits above help you decide to classify them as fruits, because you have tasted several kinds of fruits in your experience (sights, sounds, smells, etc. from your memory). You classify them as fruits because it is different from another objects, for example houses. This is one advantage of language, they help you differentiate patterns by designating a word and meaning to a pattern.

Eventually, you get to taste and see different kinds of fruits:

- Lime
- Lemon
- Grapefruit
- Pomelo

You observe they have seeds that look similar. The fruits themselves and the seeds themselves all have different sizes. They have different colors and pores in their skin. They all have a common smell. When you open them you see sacs that surround the seed. Apples are crispy but oranges are not. The four fruits above are not crispy too and all are sour.

We now decide to create a fruit sub-class called citrus fruits, due to their similarities. In our abstract thinking, we now call these similarities “characteristics”. We can now say sourness is **characteristic** of citrus fruits.

The other classes/categories of fruits are **tropical fruits** and **berries**. The key characteristic (similarities) of tropical fruits is that they grow only in hotter climates and berries prefer colder climates:

Some tropical fruits:

- Banana
- Papaya
- Mango

Some berries:

- Grapes
- Strawberry
- Blueberry
- Blackberry
- Raspberry
- Lingonberry

This is how we classify objects with the help of our sense of sight and our memory. We can call classifications or sub-classifications as **sets**. This is also how taxonomists classify flora and fauna and how academia create sub-disciplines of any field, e.g science.

Below is a Venn diagram of the set of fruits. The types of fruits are diverse, but to keep it simple, we show only two (berries and citrus fruits). The type of fruits are the **elements** of the set. The elements of a set (below: Citrus fruits and berries) can have sub-elements.

**Now let’s go back to mathematics: Ratios**

Without the ability to detect patterns via our five senses and memory, ratios will be difficult to understand. Suppose you have harvested 4 oranges and 6 apples throughout the day, this can be expressed in mathematical form called ratios.

Suppose you went to your backyard to harvest ripe fruits. Let’s say you select two kinds of fruit and you decide how many to pick up using the proper sequence of English numerals and you jot down how many (quantity) in Hindu-Arabic number symbols in a sticky note.

You jot down 4 oranges and 6 apples on your sticky note. This can be expressed in ratio form as 4 oranges **is to** 6 apples.

In ratio short hand it can be written as **4:6**, the colon (:) is read aloud as “is to”. Just like we can simplify fractions, we can write it instead as **2:3**. Fractions and ratios must always be written in *simplified form* to make it easier to read. Things are better to compare if they are simplified.

As we can see, we use our knowledge of ratios to **compare** how much of each type of fruit we have collected from our backyard. So ratios are about comparing.

**Multiple ratios**

Your friend has sent to you 8 blueberries, 16 grapes and 12 blackberries. This can be written in ratio form as [ 8:16:12 ]. Simplified form after dividing by 4, since they have the common factor of 4 based on the multiplication table = [ 2:4:3 ].

Throughout the next week you have eaten 9 grapes, 6 oranges, 21 grapes and 3 bananas. This can be written in ratio form as [ 9:6:21:3 ]. Simplified form after dividing by 3, since they have the common factors of 3, based on the multiplication table = [ 3:2:7:1 ]

**Basic Statistics: Percentages**

As explained above, in statistics, we tend to group objects or classify them. We can say we have eaten 8 strawberries and 12 grapes on a given day. We classify the object we have harvested (strawberry and blueberry) as fruits. This group or classification can be called a **set**.

So we can say in everyday conversation that “you have harvested fruits”, but strictly speaking, we can also say “We have harvested a set of fruits; and you have eaten two kinds of fruit from that set (explained in the classification/Venn diagram above)”. Berries is a subset of fruit, so the strawberry and blueberry can be called berries and fruits at the same time.

Now that we have classified the strawberry and blueberry into fruits category (since all berries are fruits), we can count the total number of fruits if we imagine collecting the fruits into a single container. So we count that are altogether 20 fruits.

So we can compare the total number of fruits we have plucked from the trees to the number of each kind of fruit and express it in fractions or ratios. It can also be explained this way:

So you have eaten 20 individual fruits today. Please specify exactly which kind of fruit you have eaten: 8 strawberries and 12 grapes.

Percentages is simply another way to express ratios, but putting the main category (fruits) or sets into account: “Out of the 20 fruits I have eaten today, I have eaten 12 grapes”. “Out of the 20 fruits I have eaten today I have eaten 8 strawberries. This will be illustrated below. This can be written in fraction or ratio form below. Does it now sound similar to the statistics about people you hear from the radio?

8 out of 20 = 8:20, Simplified: 2:5

12 out of 20 = 12:20, Simplified: 3:5

Since the ratios obtained above does not take into account the total number of fruits, it cannot be expressed as a percentage. To express them as percentages we have add up the number of fruits your friend has given to you (36 fruits total), also the total number of fruits you have eaten the whole week (39 fruits total). Getting a total is a form of addition.

**Vocabulary used in probability**

We use these words every without knowing they are related to an important field of statistics called probability. Also the concept of categorizing sets is used hand in hand with probability.

The everyday words we use are *maybe, possibility, odds, chances, likelihood, probably,* etc. These can be called probabilistic vocabulary. Just like comparative vocabulary is the precursor of units and measurement, probabilistic vocabulary is the precursor to probability as a field of statistics.

These words are useful for finding something, planning/expecting a series of events. To plan events, sometimes we rely on chance, if we expect things based on our previous experiences stored in our memory. This will be discussed in Serial 12.

**The odds of finding your desired file.**

Suppose you **forgot the filename** of the desired file you want to find. So the systemwide search function of the computer operating system you are currently using is **useless**. In Mac OS X it is Spotlight and in Windows it is Instant Search on the Start Menu.

The files on your hard drive can be classified into:

- Documents
- Music
- Pictures
- Movies
- Software Installers

They are subsets of the word “files”.

Suppose at one turn you want to find a video to enhance a skill about effectively growing plants in your backyard. Your computer can list all your files and also can list only all movie files. If you select the second option list all movie files, you will get better odds of finding the desired file. Let’s use mathematics to describe this.

You computer lists the total number of files as 1334. Out of those files are:

This can be also visualized as a pie chart in percentages:

If you don’t select the option show only movie files, you have to search file by file, this is inconvenient. But if you select “Show only movie files” you will have better chances of finding the desired file. Now let’s use numbers compare the convenience of both methods:

If you search file by file, you will have 1 out of 1334 odds of finding your file, but if you select the option: “show only movie files” you will have 1 out of 146 odds of finding your file. To compare the odds more easily, we use percentages. Probability is best expressed in percentages:

Note that percentages is semantically similar to the meaning of hundredths. So 0.07 percent means an seven hundredth hundredths, which also means 7 thousandths. So if we visualize, 7 thousandths is almost similar to 1 out of 1334 (one-thousand-three-hundred and-thirty-fourths. “Cent” is derived from the Latin word for “hundred”

One advantage of percentages is that it is better to write fractions in written form, rather than drawing diagrams all the time. One disadvantage is that they are not precise for every situation, but precise and simple enough for everyday use. Also, as you notice, percentages are cleaner looking than fraction notation.