As usual, the student manages to sit next to the professor of Statistics in the bus and, as expected, their conversation revolves around statistics.

Student: Professor, did you see the budget presentation?

Prof: No, but I am interested in the contents. If you did see, can you briefly tell me its contents?

Student: Ok, sir. I will try. The budget starts with the source and use of funds. The Finance Minister gave data in percentages, for example, how he is going to raise the money? He mentioned about income tax, corporate tax, customs, excise duty, non-budget revenue, percentage of money to be borrowed, etc.

Prof: You see, we can draw a pie chart, using different colours for different sources.

Student: Prof, it is a good idea. Can we also use a bar chart, each bar representing a source of income for the government? Also, a single bar can be used to include all sources of income.

Prof: You get the idea. Good.

Student: The budget document also talks about debt (sum total of deficits over months) and I believe it is around 70% of the GDP. Isn’t this worrisome?

Prof: Of course, we can actually use the histogram to show the overall growth of debt over time, we can draw a cumulative frequency distribution (ogive). This will tell us as to how the debt has increased over time.

Student: Prof, that is interesting. The finance minister says that less than 3 crores out of a population of 125 crores pay taxes. Is this fair?

Prof: Our government has exempted agriculture income from taxation. Many citizens try to show as though all their income have come from agriculture. We can take a random sample of people and ask them whether they have agriculture income. Then, we can calculate the conditional probability namely, given the person has agriculture income what is the probability that he is filing income tax?

Student: Quite fascinating, Prof. I am learning a lot.

Prof: We can even stratify the entire working population as to whether they work in an organized sector like in bank or in an organized sector like in a grocery store. We can take a random sample of working population and classify them in these two strata. We can take sample from each stratum and calculate the probability of them filing the tax return. My hunch is in the unorganized sector. It is a lot easier to hide the income and so the probability of them filing an IT return could be very low.

Student: Prof, can the IT department identify and chose people who have not filed the IT return?

Prof: Fairly simple. The IT department can take convenient sample of people who buy cars, build houses and then go after them if they haven’t filed IT return.

Student: Prof, talking to you always increases my knowledge of statistics.

Prof: Actually, I will do a cluster analysis to identify the tax defaulters.

Student: How will you do that?

Prof: Fairly simple. I will visit all the banks, look at accounts which have a balance of say Rs.5 lakhs. I will group them into 4 categories (4 clusters), namely:

- Rs.5 lakhs – Rs.10 lakhs
- Rs.10 lakhs – Rs.20 lakhs
- Rs.20 lakhs – Rs.50 lakhs
- Rs.50 lakhs and above

From each of these clusters, I will select a random sample of names and verify as to whether they have filed their return. Of course, we need to make sure that within each cluster, people have similar characteristics.

Student: Very innovative. Prof, thanks for your time.