6.1 Dataset

How prices of paintings at auction are determined?
What information do we need to estimate the prices of a painting in an auction?


We have data on 430 auction prices for Monet paintings, with data on the dimensions of the paintings and several other variables:

  • Price = Sale Price in $ (million),
  • Height = Height (inches),
  • Width = Width (inches),
  • Signed = Binary or Dummy variable = 1 if signed, 0 if not,
  • Picture = ID number (identifies repeat sales),
  • House = Code for auction house where sale took place.

Data on Sales of Monet Paintings
PRICE HEIGHT WIDTH SIGNED PICTURE HOUSE
3.993780 21.3 25.6 1 1 1
8.800000 31.9 25.6 1 2 2
0.131694 6.9 15.9 0 3 3
2.037500 25.7 32.0 1 4 2
1.487500 25.7 32.0 1 4 2
1.870000 25.6 31.9 1 4 1
5.282500 25.5 35.6 1 5 1
5.065750 26.0 34.3 1 5 2
1.375000 25.6 36.2 1 5 2
2.530000 25.6 36.4 1 6 2
Note: Greene, W. (2018). Econometric Analysis. 8th Edition (First 10 observations)


  • 6 variables: 3 Quantitative + 3 Qualitative.
  • 1 dummy variable.
  • A unit (painting) can provide more than one observations (price).


6.1.1 What Are Variables?

In statistics, a variable has two defining characteristics:

  • A variable is an attribute that describes a person, place, thing, or idea.
  • The value of the variable can vary from one entity to another.

For example,

  • a person’s hair color is a potential variable, which could have the value of ‘’blond’’ for one person and ‘’brunette’’ for another.
  • blood pressure is a variable, because it can vary across people and vary across time for a single person.
  • Sex (or gender) is a variable, because it differs across people.
  • Political attitude is a variable, because some people are liberals and some people are conservatives.


Qualitative vs. Quantitative Variables

Variables can be classified as qualitative (aka, categorical) or quantitative (aka, numeric).

  • Qualitative variables take on values that are names or labels. The color of a ball (e.g., red, green, blue) or the breed of a dog (e.g., collie, shepherd, terrier) would be examples of qualitative or categorical variables.
  • Quantitative variables are numeric. They represent a measurable quantity. For example, when we speak of the population of a city, we are talking about the number of people in the city - a measurable attribute of the city. Therefore, population would be a quantitative variable.


Discrete vs. Continuous Variables

Quantitative variables can be further classified as discrete or continuous. If a variable can take on any value between its minimum value and its maximum value, it is called a continuous variable; otherwise, it is called a discrete variable.

Some examples will clarify the difference between discrete and continuous variables.

  • Suppose the fire department mandates that all fire fighters must weight between 150 and 250 pounds. The weight of a fire fighter would be an example of a continuous variable; since a fire fighter’s weight could take on any value between 150 and 250 pounds.
  • Suppose we flip a coin and count the number of heads. The number of heads could be any integer value between 0 and plus infinity. However, it could not be any number between 0 and plus infinity. We could not, for example, get 2.3 heads. Therefore, the number of heads must be a discrete variable.


Univariate vs. Bivariate Data

Statistical data are often classified according to the number of variables being studied.

  • Univariate data. When we conduct a study that looks at only one variable, we say that we are working with univariate data. Suppose, for example, that we conducted a survey to estimate the average weight of high school students. Since we are only working with one variable (weight), we would be working with univariate data.
  • Bivariate data. When we conduct a study that examines the relationship between two variables, we are working with bivariate data. Suppose we conducted a study to see if there were a relationship between the height and weight of high school students. Since we are working with two variables (height and weight), we would be working with bivariate data.

Source