Q: What is a baby name cluster?
A: It is a collection of related names. Names can be related because they sound similar or because they are common variations. The cluster size can be changed on the search pages by moving the slider left and right. A larger cluster will include very distantly related names, and a small cluster only includes the slightest variants of the main name.
Q: How does the search work?
A: When you type a word into the search box, all names with that string of characters will be presented. So, if you want to know about all names which contain the string "sun" in them, then entering "sun" in the search box will show a variety of names such as: Sunshine, Sunny, Sunil, and Sun. Note, that is a mixture of male and female names. To only search one gender, be sure to click the correct button beneath the search box.
Q: Some of the names that are shown as related look very unrelated. Why?
A: The name matching algorthim uses a function related to "soundex" which converts words to a reduced number of consonants. For example, the consonants J, W, G all end up with the same sound. Thus, some names like: Ewan, Wayne, Gene all end up being related to John and will be clustered together.
Q: What do the red and black mean in the charts?
A: The red means that the names in the cluster reached a particular popularity in a given year. If the chart is flat with no visible bars, then the name is in the top (approx) 3000 names, but barely. If the chart is not even visible, then the name never reached the top (approx) 3000 names. The name popularity data was downloaded from the Social Security Administration website. The small charts were drawn using the sparklines php library, inspired by Edward Tufte. And the big graphs were done using the nice package called Open Flash Chart.
Q: What is a "modified" rank and why do some seemingly less popular names have a higher rank than more popular names?
A: The "modified" rank combines the popularity of all the names in the cluster, with more recent dates having a much larger influence on the rank than in the past. The chart merges an entire decade together it is impossible to see from the chart what the most recent year's ranking was. Since the modified rank is very strongly influenced by the most recent names, then the order of the names may appear to contradict the charts.
Q: Where did you get your data?
A: Most of the names and ranks were from the Social Security Administration website and wikipedia. Other info such as the meanings, origins and some relationships were found from a variety of online sources. The full list of related names, their "distances" and the modified rank were generated using my own algorithm. I used the soundex type function called Double Metaphone to help determine the distance between all pairs of names.
Q: I need more help or have other questions.
A: We are just getting rolling, so please contact me and I will be glad to answer your question and improve this help page.