In this short piece, we use public Wikipedia data, Python programming, and network analysis to extract and construct a network of Oscar-winning actors.
All images were created by the author.
As the largest free, crowdsourced online encyclopedia, Wikipedia serves as an incredibly rich source of data from a variety of public domains. Many of these domains, from film to politics, involve networks at different levels that express different kinds of social phenomena, such as cooperation. Due to the upcoming Academy Awards, here we show examples from Oscar-winning actors and actresses on how you can turn your wiki site into a network using simple Pythonic methods.
First, let’s look at, for example, how a Wiki list of all Oscar-winning actors is structured.
This subpage has a good representation of everyone who has ever won an Oscar and been given a Wiki profile (there’s a good chance fans won’t miss any actors and actresses). This article focuses on acting, which can be found on the following four subpages, including leading and supporting actors.
urls = { 'actor' :'https://en.wikipedia.org/wiki/Category:Best_Actor_Academy_Award_winners',
'actress' : 'https://en.wikipedia.org/wiki/Category:Best_Actress_Academy_Award_winners',
'supporting_actor' : 'https://en.wikipedia.org/wiki/Category:Best_Supporting_Actor_Academy_Award_winners',
'supporting_actress' : 'https://en.wikipedia.org/wiki/Category:Best_Supporting_Actress_Academy_Award_winners'}
Now let’s write a simple block of code that checks each of these four lists and extracts the names of all artists using the urllib and beautifulsoup packages.
from urllib.request import urlopen
import bs4 as bs
import re# Iterate across the four categories
people_data = []
for category, url in urls.items():
# Query the name listing page and…