Quick overview so you don’t have to read the whole thing:
You can rank blogs using social network analysis methods (well at least I think you can). I’ve done it for PR blogs – see last pic. Done.
In society we don’t often rank people on a continuous scale; we usually group them discretely based on a number of demographics/ characteristics such as their level of education, class or profession. You wouldn’t give a rating out of ten for people you know – you’d label them along the lines of “friends”, “people you like”, “people you don’t know”, “people you don’t like”, etc.
It’s the same in PR; during a media outreach campaign, we tend to rank target press into tiers and prioritise them according to how important they are to the client. Blogger outreach is slightly more complicated. With the amount of information the web presents us with, it should be easier. But all too often many of us take data from Google, Yahoo, Alexa, inbound links, etc and pull them through an arbitrary algorithm which ranks the websites accordingly. It helps with the measurement/review process and confuses the hell out of clients. This post hopes to take some of the guess work out of this process or at least provide a viable alternative.
One way of looking at ranking is how a blog/website sits within its respective networks. My aim is to apply a bit more automation and maths to the process of identifying influencers. It’s never going to replace proper research, but – to some extent – it’s maths and if you accept the methodology, you cannot argue with the results (only how you interpret it).
To understand ranking using social network analysis, we have to understand how prestige, another measurement of influence, works.
Think about the most popular person when you were at school. I’m not sure if this is the same in posh schools, but at mine the main characteristics of popularity largely depended on gender. For girls, she would simply be the hottest one (or as hot as a 15 year old could be). She went out with the 26 year old fridge repair man and occasionally stayed at his one bedroom flat above one of the estate’s many local off licenses. For the boy, it was whoever was hardest.
What made their popularity more apparent was that everyone wanted to hang out with them, but they just thought you and your mates were geeks. Prestige works in the same way. From a social network analysis point of view, popularity is the number of positive choices received by someone, while prestige also takes into account how many of those links are reciprocated. If there are two vertices in a network and the choices are symmetrical (i.e they both either like or dislike each other), we can assume that they are of the same rank. However, if the the ties are asymmetrical we would assume that the receiver is ranked above the sender.
When we are deciding whether to follow someone back on Twitter, most of us look at their follower/following ratio. If they are following 25,000 people and are only followed by 600 then we wouldn’t bother following them. However, if they have much more followers than those that they follow, then they must be of interest to some people. Obviously, not all positives choices are, well, positive. For instance if someone owes someone else money the ties would need to be reversed to take this into account.
Here’s a very simple Triad (a group containing three nodes) illustrating my point. If you want to know more about this, have a look at Triadic analysis and balance theory where you’ll see this particular triad referred to as 120U. Here we can see that there are two clusters: nodes 2 and 3, and node 1; node 1 is ranked higher than 2 and 3 (which are of the same rank).
This model (transivity) also presumes that like the chain of command in an army, if node 2 takes orders from node 1 then a node which takes orders from node 2, also takes orders from node 1. We might see this in how we receive news: Mashable reports on a particular story, which is then repeated by a blogger and then read by Person A. Person A is essentially getting their news from Mashable.
Structural prestige vs social prestige
Does high prestige equal high influence? Yes and no. Structural prestige does not always readily lend itself to real world situations. For example, a profession, such as a lawyer may be perceived as ‘prestigious’ (social prestige) but there are unlikely to be other many professions ‘linking’ to it. On the other hand, administrators may not be commonly associated with prestige but the profession is more likely to have a high ‘structural prestige’ in terms of the number of other professions it is linked to.
However, as with all types of influence I’ve discussed in previous posts there is often a very clear overlap and it is also dependent on the context. For example, although a lawyer may not be structurally prestigious (in terms of other professions linking to them), there is a good chance that many people will choose them as a source of information because of of their standing within a community.
Although there are other ways of measuring prestige (such as proximity or the input domain of the node) but for the purpose of this blog post, I will only be referring to the criteria outlined in the example below.
Ranking of UK PR Blogs
Quick overview explaining methodology
As a starting point, I’m using this list of UK PR Blogs to collect data. I believe Mat worked with a number of PR bloggers to put it together a while back for an experiment he was working on. I’m aware that there are quite a few blogs missing (top fella Ben Matthews‘ blog is one notable omitance). I’ll then use Porter Novelli’s Rufus Tool to spider the blogs and see how they all link together. I am making the assumption that a link from one blog to another suggests that the linker has read that site. Link backs may not be completely reliable, but it is essentially what Google does and who would argue with that? – well me, a bit further down.
I’m then going to run the raw data through an analytical software called Netdraw and delete all the sites that do not belong in the original seedlist. The resulting network will then be run through another analysis software called Pajek. If I have time (bear in mind it’s actually Christmas Day as I write this sentence) I might try and look at the types of sites and relationships between them, though it looks increasingly likely that I’ll save this for my Summer 2010 post.
Here’s the results:
Unprocessed network map
This is what the tool punts out initially before we process it. At the minute, there is too much information for it to be useful. What we can see is a massive cluster central to the map where I’d expect to find the well linked, older blogs and the newer Litman/Jed blogs somewhere on the peripheral but still fairly central. Just outside of this are the blogs which are related to PR but not strictly about it. If you look hard enough, you’ll see Will McInnes‘ blog and I suspect Mat’s there somewhere. A few year’s ago, these would have been probably more central but with so may me-too PR blogs only linking to other PR folk, they are increasingly peripheral to this network of PR bloggers. The next step is to begin parsing this list to make sense of it, which, in all honesty is a bit of a ball ache.
This involved, exporting the raw data as a VNA file and a little bit of Excel magic (with some vlookup action). Basically I asked Excel to highlight where there were duplicates of the original list and tagged them with the attribute, “ORIGINAL” and the others as “DELETE” which allowed me to literally switch off the websites I didn’t want to use. ‘Partitioning’ can also be used to label sites to give you an overview, for instance of how certain types of cancer sites interact with government sites and is an excellent method of painting an online landscape.
So when opened Netdraw I started with something like this:
Which once filtered looks a like this:
Analysing the network
What we can do using Netdraw is identify who is the most popular blog in this network (count the number of inbound links), highlight those that are an important conduit for the flow of information (betweeness centrality) and if i really could be bothered, we could also work out eigenvector centrality or Markov centrality (as described in my post a couple of months ago) but we won’t do all that. Here’s what it looks like when I’m just working it out popularity and betweeness.
I’ve sized the nodes according to its popularity or indegree (number of inbound links). It’s no surprise to find that the most popular are the more older blogs such as Mr Hobson‘s, Davies’ and Drew B’s blog. See my previous post about preferential attachment and the rich getting richer. Betweeness is interesting in this network. I’ve coloured them into three groups. The sites with low betweeness centrality are grey, orange nodes have medium betweeness and red nodes are high. The biggest winner in all of this is Drew’s blog, which is both the most popular and the most important for spreading information in the network. Other noticeable winners are Consolidated’s Mike Litman and Wildfire’s Danny Whatmough. I’m not going to read too much into this now, identifying these sites is something we’ve been working on for a while and I know Mat’s done a similar post recently. This is just me messing and getting distracted.
I’ve removed all the isolates (nodes with no ties) so Becky McMichael and a couple of others disappear completely. Now, straight off you can see that there are some problems with certain aspects of my methodology. Becky is definitely linked by a number of the other blogs and I would expect more connectivity overall. This could be because of a number of things:
- The blog roll was made by Mat and a random mix of people. Therefore, it would be expected that there would be fewer interlinking nodes and more random blogs. Had I or Wadds compiled the list, I would expect the aforementioned PR blogs (Matthews, Dahljit, etc) to have been included because we are all bum chums.
- There’s a lack of blog rolls on the home pages of many of the blogs. Jed’s blog for example has a separate page for his links, so while the Peter Pan of PR is a popular fella among the blogging community he’s a bit of a pariah in this network.
- Bloggers failing to update their links also mean that for certain blogs the analytic software sees two different URLs. For example, the Rainier PR URL is still used by many for Wadds’ Tech Blog despite the change to Speed.
- URLs are also often inconsistent on blogs. Ruder Finn’s Becky McMichael is listed as http://www.BeckyMcMichael.com on the original seed list, but the WordPress address is used on Simon Collister‘s blog roll. Again this proves problematic and unfairly omits some blogs.
I’m not sure whether I’m entirely correct in what I’ve pointed out – perhaps Mat can confirm that these are a problem? While we are nit-picking at my methodology, I thought I’d best highlight another caveat. In my experience, using links and blog rolls is not entirely reliable for assessing what people read. For example, I literally stole my blog roll from Jed, deleted it by accident and haven’t bothered replacing it – that’s how little I care about it. Future politician, Simon Collister has also failed to update his for two years it seems – see Alex Pullin’s link. Other bloggers throw out links to try get on other blogger’s radar because it costs them nothing. But it’s as good as indicator as anything to be honest. Also ignore bloglines.com – it’s just me messing around.
This is where we can start analysing networks a bit more. I’m pretty sure you can do this in Netdraw but I’ve not been able to figure it out (well I can’t find the answer on Google). I’ve removed all the asymmetrical ties to identify clusters of mutual ties in the network. We can see that although overall the network is very clustered, there are actually very few mutual ties between blogs. In fact there are only five mutual ties (see my points above why this may not be entirely correct).
The next steps are a bit complicated – I’ve asked Pajek to shrink the network into strong components so that it is easier to work out how the groups are ranked. I’ve then mapped the original list of blogs on top of this small network and layered the network below to make more sense.
Noticeable winners are the aforementioned Davies and Hobson. If this was the Divine Right of Kings chart we use to see in history text books, the two of them would effectively be sharing the role of God. But it’s only me messing around with some toys (feel free to give yourselves a pat on the back though). Surprisingly, blogs by Drew, Michael Litman and Danny Whatmough, which were so highly ranked in terms of popularity and betweeness earlier are both in the fourth tier. This is perhaps due to the fact that while they all have a healthy amount of inbound links from others in the network, they also link out a lot. Again, people shouldn’t read too much into this – for a start, the Axicom blog is in the second tier.
So what does this tell us? Probably not too much at the moment. In some ways it’s correct in its assertion that Hobson and Davies should be the most highly ranked. However, the blogosphere probably doesn’t lend itself to this kind of ranking to identify influence – the flow of information is too decentralised, especially in this particular dataset where many bloggers link to each other out of politeness as much as anything else. If you look at content, there’s also not much going on in terms of newsflow – bloggers either come up with their own ideas or regurgitate something they saw on Mashable.
So I’ve been pretty much spent my Christmas pissing in the wind. However, what I should perhaps have done (had I had time) is to open the network up completely and not limit it to the original dataset. This would then allow us to see the flow of news in the overall UK PR blogs network (which we could tag). There are also many other ways you can rank blogs using a slightly different methodology – this is just one of them.