|In what percentage of those paintings is a mountain covered with snow?
Given that Ross painted a mountain, there is a 66 percent chance there is snow on it.
What about footy little hills?
Hills appear in 4 percent of Ross’s paintings. He clearly preferred almighty mountains.
How about happy little clouds?
Excellent question, as 44 percent of Ross’s paintings prominently feature at least one cloud. Given that there is a painted cloud, there’s a 47 percent chance it is a distinctly cumulus one. There’s only a 14 percent chance that a painted cloud is a distinctly cirrus one.
What about charming little cabins?
About 18 percent of his paintings feature a cabin. Given that Ross painted a cabin, there’s a 35 percent chance that it’s on a lake, and a 40 percent chance there’s snow on the ground. While 72 percent of cabins are in the same painting as conifers, only 63 percent are near deciduous trees.
How often did he paint water?
All the time! About 34 percent of Ross’s paintings contain a lake, 33 percent contain a river or stream, and 9 percent contain the ocean.
Sounds like he didn’t like the beach.
Much to the contrary. You can see the beach in 75 percent of Ross’s seaside paintings, but the sun in only 31 percent of them. If there’s an ocean, it’s probably choppy: 97 percent of ocean paintings have waves. Ross’s 36 ocean paintings were also more likely to feature cliffs, clouds and rocks than the average painting.
What about Steve Ross?
Steve seemed to prefer lakes far more than Bob. While only 34 percent of Bob’s paintings have a lake in them, 91 percent of Steve’s paintings do.
One useful lens we can apply to this sort of data — where we’re comparing vectors of information — is a clustering tool. The idea behind clustering is to determine how close certain groups of data are to other points in the data set. Researchers use clustering analysis in all sorts of areas — from biology to consumer marketing — as a way of segmenting a population of, say, plants or people. It allows us to find interesting subsets of data based on how similar or different certain subgroups are from the rest of the set. I used an algorithm to divide the entire set of 403 paintings from “The Joy of Painting” into clusters