Ranking top UK PR blogs using social network analysis

Quick overview so you don’t have to read the whole thing:

You can rank blogs using social network analysis methods (well at least I think you can). I’ve done it for PR blogs – see last pic. Done.

Ranking

In society we don’t often rank people on a continuous scale; we usually group them discretely based on a number of demographics/ characteristics such as their level of education, class or profession. You wouldn’t give a rating out of ten for people you know – you’d label them along the lines of “friends”, “people you like”, “people you don’t know”, “people you don’t like”, etc.

It’s the same in PR; during a media outreach campaign, we tend to rank target press into tiers and prioritise them according to how important they are to the client. Blogger outreach is slightly more complicated. With the amount of information the web presents us with, it should be easier. But all too often many of us take data from Google, Yahoo, Alexa, inbound links, etc and pull them through an arbitrary algorithm which ranks the websites accordingly. It helps with the measurement/review process and confuses the hell out of clients. This post hopes to take some of the guess work out of this process or at least provide a viable alternative.

One way of looking at ranking is how a blog/website sits within its respective networks. My aim is to apply a bit more automation and maths to the process of identifying influencers. It’s never going to replace proper research, but – to some extent – it’s maths and if you accept the methodology, you cannot argue with the results (only how you interpret it).

To understand ranking using social network analysis, we have to understand how prestige, another measurement of influence, works.

Prestige

Think about the most popular person when you were at school. I’m not sure if this is the same in posh schools, but at mine the main characteristics of popularity largely depended on gender. For girls, she would simply be the hottest one (or as hot as a 15 year old could be). She went out with the 26 year old fridge repair man and occasionally stayed at his one bedroom flat above one of the estate’s many local off licenses. For the boy, it was whoever was hardest.

What made their popularity more apparent was that everyone wanted to hang out with them, but they just thought you and your mates were geeks. Prestige works in the same way. From a social network analysis point of view, popularity is the number of positive choices received by someone, while prestige also takes into account how many of those links are reciprocated. If there are two vertices in a network and the choices are symmetrical (i.e they both either like or dislike each other), we can assume that they are of the same rank. However, if the the ties are asymmetrical we would assume that the receiver is ranked above the sender.

When we are deciding whether to follow someone back on Twitter, most of us look at their follower/following ratio. If they are following 25,000 people and are only followed by 600 then we wouldn’t bother following them. However, if they have much more followers than those that they follow, then they must be of interest to some people. Obviously, not all positives choices are, well, positive. For instance if someone owes someone else money the ties would need to be reversed to take this into account.

Here’s a very simple Triad (a group containing three nodes) illustrating my point. If you want to know more about this, have a look at Triadic analysis and balance theory where you’ll see this particular triad referred to as 120U. Here we can see that there are two clusters: nodes 2 and 3, and node 1; node 1 is ranked higher than 2 and 3 (which are of the same rank).

This model (transivity) also presumes that like the chain of command in an army, if node 2 takes orders from node 1 then a node which takes orders from node 2, also takes orders from node 1. We might see this in how we receive news: Mashable reports on a particular story, which is then repeated by a blogger and then read by Person A. Person A is essentially getting their news from Mashable.

Structural prestige vs social prestige

Does high prestige equal high influence? Yes and no. Structural prestige does not always readily lend itself to real world situations. For example, a profession, such as a lawyer may be perceived as ‘prestigious’ (social prestige) but there are unlikely to be other many professions ‘linking’ to it. On the other hand, administrators may not be commonly associated with prestige but the profession is more likely to have a high ‘structural prestige’ in terms of the number of other professions it is linked to.

However, as with all types of influence I’ve discussed in previous posts there is often a very clear overlap and it is also dependent on the context. For example, although a lawyer may not be structurally prestigious (in terms of other professions linking to them), there is a good chance that many people will choose them as a source of information because of of their standing within a community.

Although there are other ways of measuring prestige (such as proximity or the input domain of the node) but for the purpose of this blog post, I will only be referring to the criteria outlined in the example below.

Ranking of UK PR Blogs

Quick overview explaining methodology

As a starting point, I’m using this list of UK PR Blogs to collect data. I believe Mat worked with a number of PR bloggers to put it together a while back for an experiment he was working on. I’m aware that there are quite a few blogs missing (top fella Ben Matthews‘ blog is one notable omitance). I’ll then use Porter Novelli’s Rufus Tool to spider the blogs and see how they all link together. I am making the assumption that a link from one blog to another suggests that the linker has read that site. Link backs may not be completely reliable, but it is essentially what Google does and who would argue with that? – well me, a bit further down.

I’m then going to run the raw data through an analytical software called Netdraw and delete all the sites that do not belong in the original seedlist. The resulting network will then be run through another analysis software called Pajek. If I have time (bear in mind it’s actually Christmas Day as I write this sentence) I might try and look at the types of sites and relationships between them, though it looks increasingly likely that I’ll save this for my Summer 2010 post.

Here’s the results:

Unprocessed network map


This is what the tool punts out initially before we process it. At the minute, there is too much information for it to be useful. What we can see is a massive cluster central to the map where I’d expect to find the well linked, older blogs and the newer Litman/Jed blogs somewhere on the peripheral but still fairly central. Just outside of this are the blogs which are related to PR but not strictly about it. If you look hard enough, you’ll see Will McInnes‘ blog and I suspect Mat’s there somewhere. A few year’s ago, these would have been probably more central but with so may me-too PR blogs only linking to other PR folk, they are increasingly peripheral to this network of PR bloggers. The next step is to begin parsing this list to make sense of it, which, in all honesty is a bit of a ball ache.

This involved, exporting the raw data as a VNA file and a little bit of Excel magic (with some vlookup action). Basically I asked Excel to highlight where there were duplicates of the original list and tagged them with the attribute, “ORIGINAL” and the others as “DELETE” which allowed me to literally switch off the websites I didn’t want to use. ‘Partitioning’ can also be used to label sites to give you an overview, for instance of how certain types of cancer sites interact with government sites and is an excellent method of painting an online landscape.

Netdraw

So when opened Netdraw I started with something like this:

Which once filtered looks a like this:

Now this is a bit nicer. Look at Jordan’s fairly recent Digital Prescription blog there, and notice how Geordie ASBO-dodger, Stephen ‘Ste’ Davies‘ hogs all the attention.

Analysing the network

What we can do using Netdraw is identify who is the most popular blog in this network (count the number of inbound links), highlight those that are an important conduit for the flow of information (betweeness centrality) and if i really could be bothered, we could also work out eigenvector centrality or Markov centrality (as described in my post a couple of months ago) but we won’t do all that. Here’s what it looks like when I’m just working it out popularity and betweeness.

I’ve sized the nodes according to its popularity or indegree (number of inbound links). It’s no surprise to find that the most popular are the more older blogs such as Mr Hobson‘s, Davies’ and Drew B’s blog. See my previous post about preferential attachment and the rich getting richer. Betweeness is interesting in this network. I’ve coloured them into three groups. The sites with low betweeness centrality are grey, orange nodes have medium betweeness and red nodes are high. The biggest winner in all of this is Drew’s blog, which is both the most popular and the most important for spreading information in the network. Other noticeable winners are Consolidated’s Mike Litman and Wildfire’s Danny Whatmough. I’m not going to read too much into this now, identifying these sites is something we’ve been working on for a while and I know Mat’s done a similar post recently. This is just me messing and getting distracted.

I’ve removed all the isolates (nodes with no ties) so Becky McMichael and a couple of others disappear completely. Now, straight off you can see that there are some problems with certain aspects of my methodology. Becky is definitely linked by a number of the other blogs and I would expect more connectivity overall. This could be because of a number of things:

  • The blog roll was made by Mat and a random mix of people. Therefore, it would be expected that there would be fewer interlinking nodes and more random blogs. Had I or Wadds compiled the list, I would expect the aforementioned PR blogs (Matthews, Dahljit, etc) to have been included because we are all bum chums.
  • There’s a lack of blog rolls on the home pages of many of the blogs. Jed’s blog for example has a separate page for his links, so while the Peter Pan of PR is a popular fella among the blogging community he’s a bit of a pariah in this network.
  • Bloggers failing to update their links also mean that for certain blogs the analytic software sees two different URLs. For example, the Rainier PR URL is still used by many for Wadds’ Tech Blog despite the change to Speed.
  • URLs are also often inconsistent on blogs. Ruder Finn’s Becky McMichael is listed as http://www.BeckyMcMichael.com on the original seed list, but the WordPress address is used on Simon Collister‘s blog roll. Again this proves problematic and unfairly omits some blogs.

I’m not sure whether I’m entirely correct in what I’ve pointed out – perhaps Mat can confirm that these are a problem? While we are nit-picking at my methodology, I thought I’d best highlight another caveat. In my experience, using links and blog rolls is not entirely reliable for assessing what people read. For example, I literally stole my blog roll from Jed, deleted it by accident and haven’t bothered replacing it – that’s how little I care about it. Future politician, Simon Collister has also failed to update his for two years it seems – see Alex Pullin’s link. Other bloggers throw out links to try get on other blogger’s radar because it costs them nothing. But it’s as good as indicator as anything to be honest. Also ignore bloglines.com – it’s just me messing around.

Using Pajek

This is where we can start analysing networks a bit more. I’m pretty sure you can do this in Netdraw but I’ve not been able to figure it out (well I can’t find the answer on Google). I’ve removed all the asymmetrical ties to identify clusters of mutual ties in the network. We can see that although overall the network is very clustered, there are actually very few mutual ties between blogs. In fact there are only five mutual ties (see my points above why this may not be entirely correct).

The next steps are a bit complicated – I’ve asked Pajek to shrink the network into strong components so that it is easier to work out how the groups are ranked. I’ve then mapped the original list of blogs on top of this small network and layered the network below to make more sense.

Noticeable winners are the aforementioned Davies and Hobson. If this was the Divine Right of Kings chart we use to see in history text books, the two of them would effectively be sharing the role of God. But it’s only me messing around with some toys (feel free to give yourselves a pat on the back though). Surprisingly, blogs by Drew, Michael Litman and Danny Whatmough, which were so highly ranked in terms of popularity and betweeness earlier are both in the fourth tier. This is perhaps due to the fact that while they all have a healthy amount of inbound links from others in the network, they also link out a lot. Again, people shouldn’t read too much into this – for a start, the Axicom blog is in the second tier.

So what does this tell us? Probably not too much at the moment. In some ways it’s correct in its assertion that Hobson and Davies should be the most highly ranked. However, the blogosphere probably doesn’t lend itself to this kind of ranking to identify influence – the flow of information is too decentralised, especially in this particular dataset where many bloggers link to each other out of politeness as much as anything else. If you look at content, there’s also not much going on in terms of newsflow – bloggers either come up with their own ideas or regurgitate something they saw on Mashable.

So I’ve been pretty much spent my Christmas pissing in the wind. However, what I should perhaps have done (had I had time) is to open the network up completely and not limit it to the original dataset. This would then allow us to see the flow of news in the overall UK PR blogs network (which we could tag). There are also many other ways you can rank blogs using a slightly different methodology – this is just one of them.

Advertisements

40 responses to “Ranking top UK PR blogs using social network analysis

  1. The key is the original data set and that can be a pain in the ass to gather, interesting stuff here Tim. See I didn’t even make the cut ^_^ Happy 2K10

  2. Tell me about it…I don’t even think I’d be anywhere on this list, yet Jed Hallam’s strutting his stuff near the top there.

    i think the original data set used for this was literally Mat asking people on Twitter to add their name to this list. It would be interesting to see what we’d come up with if we used the list Peter Hay’s compiling over at PR Week (he ignores me when I ask him for it).

    Happy New Year to you too buddy.

  3. I do like to strut…

    Happy New Year!

  4. Not bad work at all for a boy from Bradford.

    So do you advise me to add a blogroll back on to my blog (I promise to include you).

  5. @jedhallam you too and congrats on the engagement

    @stuartbruce yeah it would help but only if it was regularly updated (which i doubt anyone really does unless they have a complete overhaul of their blog). Out of interest, why’d you take it off in the first place?

    I don’t think i’m on anyone’s blogroll unless i work/have worked for them…

  6. Cool – and very interesting at that. Good piece of work.

    H

  7. You’re on mine, my blogroll is a link out to my public feedslist on bloglines, seemed the easiest way to keep things up to date

  8. Am tallest midget in circus.

  9. Keep up the good work but I reckon you need to find a nice northern hobby to keep you entertained.

  10. @wadds I suppose I’m kind of getting obsessed by this social network stuff. To be honest I’m a bit bored. I can spend literally weeks writing a post only for it to tell me that my methodology was wrong. The only thing that’s keeping me going is that I think I’ve only scratched the surface of it. Coal mining/ whippet making/hub cap stealing would be a decent alternative though.

    @stuartbruce – apologies, I didn’t read your comment correctly. For this specific methodology, it’s better to have fewer links because it implies that more people ‘vote’ for you than you for them. Again someone can correct me if I’m wrong, but this is what Google does right? it counts the links, regardless of context (unless it’s spam, etc) and works out pagerank, etc from it.

    Which is why this particular methodology will not work for identifying influencers.

    In the real world we are limited to how much time we can devote to spending time with people and as such we are much more careful with who we spend it with. However, on the web, links are free and easy to give out. The size of people’s blog rolls seems to have quadrupled since the days of PR Guru’s Musings :).

    An ideal tool would scan blog posts for the last 6 months and pick up links from the main body only. If the dataset is big enough it will work. If not, perhaps an online campaign isn’t the correct approach?

    Or it could scan blog rolls and cross reference it with Google reader to delete dead links?

    I’m going to try it with the end result aiming to be a linear model of news flow and see what I come up with.

    I’m just literally making this up as I go along here. Someone shout me down.

  11. This is one of the most interesting blog posts I’ve read in months. As much as I’m depressed that you wrote this over Christmas, it’s massively eye opening and has got me more than a little bit interested in checking out eigenvector centrality.

    Well done chap, it’s a fascinating area and a great post.

  12. Really interesting post Tim.

  13. Thanks Jordan, really insightful. Are you just getting a link back for your blog for SEO purposes? My blog’s got a page rank of about 2, you know that right?

  14. Pingback: Cool stuff – January 4, 2010 — Danny Whatmough.com

  15. Hi Tim,

    Good stuff here. I believe that blogrolls (and other site-wide links) have been devalued by Google as an influence on ranking. Would be handy-dandy to base the influence on the links made in posts, as opposed to sidebars. And, as you say, blogrolls are rarely updated & sometimes stolen.

    Some time ago, Mat asked me (and others) for my Google Reader XML. Has anything been done with that?

  16. I’m not sure – I think Mat played around with it for a while but didn’t have enough data to make a full analysis. Agreed on the links made in post is probably more relevant however, I’ve got a feeling that it will then be totally disjointed and unconnected – it’s rare that I post anyway, and even more rare that I repost stuff I’ve read. I know other bloggers are different but if we took this route I’m not sure if there would be sufficient data to make any meaningful conclusion (though i could be well wrong).

  17. Pingback: renaissance chambara | Ged Carroll - Links of the day

  18. Alex King has made a link-harvest plug-in for WordPress that collates and sorts the sites that bloggers actually link to in their posts. I reckon you could get 20-30 people to install it and reap some significant data from that.

    http://alexking.org/projects/wordpress

  19. Hi Ian, thanks for the tip. The only thing about asking people to install the plug-in is that it will only be relevant to this post and the people who read it (i.e. the people who know me).

    The main objective of all this researching and playing around with stuff is to see whether we can take some of the guesswork out of what we do daily and go some way to automating it. The end result is to use some of this methodology in the day-to-day blogger outreach/online influencer outreach grunt work which would require a much more applicable method.

    It would be interesting though…

  20. Hi Tim, one of the research team here flagged your post to me this morning – some good food for thought, ranking and organising this kind of data is no easy feat.

    Really interesting stuff, that’s probably why they gave you a bigger picture than me in the PR Week 29 Under 29 feature 😉

    Take it easy,

    James

  21. Thanks for the comment James. Let’s brush that pic under the carpet shall we 🙂

  22. nice informations.Thanks!!!!!!! 🙂

  23. James it was because our shoes weren’t as funky as Tim’s 😉

    Great post Tim.

    Rachel

  24. Nice Information. keep sharing guys.

  25. It was a good post.I really like it.Thank you for your post. Thanks…..

  26. Hi there.
    Thank you for a great post. It was very helpfull.
    Anyone reading this post should bookmark this guys contents.

    I have a new PC and needed some installation help so i went over to http://www.InstallSoftware.com but they did not provide me with the in depth
    info this guy did. he kicks all the bigger sites’ butts.

    Thanks Again

  27. Some serious stuff to digest here as well as some good links Tim, thanks for posting it.

  28. Pingback: Influence and Virality: A Primer | twopointouch

  29. Great read, very use info here. I will definatly bookmark this site for future reference. keep up the good work!

  30. it is a good and interesting work, thanks

  31. A fascinating post. I would like to do a similar analysis on another blog community. Can you recommend a tool for collecting the data? I searched for Rufus and it does not seem to be available. Your help would be much appreciated!

    Thanks,
    Bob

  32. Thanks for the sharing…. I’ve been doing some of those but wasn’t quite sure what’s next 🙂 now I know

  33. nice post..
    ur explain to scientics…

  34. Tim – you warned me you wrote some riveting stuff on here, but I wasn’t prepared for that. Can only say I’m glad you didn’t waste Christmas Day.

    Really excellent analyses. Superb use of the tools etc. Would be lovely to try and lift the focus a level, and see how much this coheres with the way not only bloggers link with one another via blogrolls, but how non-bloggers consume this stuff too. There’s gold in them thar hills, as they say.

    And, as a bonus, that’s Christmas 2010 sorted for you as well.

    Now, to @domw’s point: I’m off to see about eigenvector centrality too. Don’t want to be caught short next time we share pints.

  35. Hi Tim.

    You don’t know me — but that’s OK. Just wanted to tell you that I’ve been reading your blog for a while now (especially the stuff about social network analysis) and it really gave me some insight. Also, it’s been six months now so I think you should get back to work and make another post 😉

  36. Hi Unagi,

    Thanks for the comment. I’ve been thinking about one for the last few months but havent writing it yet but I’ll start one this weekend so expect it around about November time

    Tim

  37. Sorry for off topic, but 2012 is close, is this really matter?

  38. Pingback: Ranking top UK PR blogs using social network analysis | Much ado about nowt « Brendan Cooper – your friendly social media consultant and PR copywriter. Or: words make ideas make money.

  39. The class is quiet; the teacher paces from row to row while
    his students reluctantly take their weekly biology test.
    Some issues can be solved in a matter of days and will not have a lasting impact on your
    life. I rationalized the whole thing, and as I look back now, it must have sounded like something Jim Carey would
    say in the movie ‘Dumb and Dumber.

  40. I really like your writing style, good information, regards for putting up kgdadagecakg

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s