Of internal edges (mentions between users), — the number of internal edges (mentions) per node (this gives a measure of how much activity there is inside the community), — the conductance and the weighted conductance of the NS-018 web community within the whole network, — the mean sentiment of edges within the community, using the (MC) measure,6 — whether the community consisted of a single connected component (good candidate communities will of course be connected; however, very infrequently the Louvain method can generate disconnected communities, by removing a `bridge’ node during its iterative refinement of its communities), — the fraction of internal mentions with non-zero sentiment (some of our candidate communities were composed mainly of users speaking a non-English language, and we used this measure to filter them out; tweets in other languages are likely to be assigned a zero sentiment score, because the sentiment scoring algorithm does not find any English words with which to gauge the sentiment),4rsos.royalsocietypublishing.org R. Soc. open sci. 3:…………………………………………This code is freely available from https://sites.google.com/site/findcommunities/.This algorithm, and some refinements to it, are also implemented in the CFINDER program, freely available from http://www. cfinder.org/.6 Owing to time constraints and the large number of tweets involved in community detection, we decided not to calculate the (SS) and (L) scores at this stage.– some statistics summarizing the role played in the community by recently registered users; and — a breakdown of the frequency of participation of users in the community. (For each user in the community, we counted how many distinct days they had been active on Twitter in our data, and then calculated the percentage of these days on which they had posted within the candidate community. We calculated the average across all users in the community, and also split the users up into five bins.) Based on the above statistics, we short-listed a subset of communities and performed a manual inspection of a sample of the tweets within the community, to assess the topics talked about and a visualization of the community, using the program VISONE (http://visone.info/html/about.html) for this subset. In the end, we selected 18 communities to monitor and study. Table 1 shows most of the statistics listed above for these 18 communities, in size order. In each numerical column, the get RRx-001 highest six values are highlighted in italics and the lowest six values are highlighted in bold (recall that for conductance and weighted conductance, lower values indicate a more tightly knit community). The `Algorithm’ column contains `L’ for the Louvain method, `W’ for the weighted Louvain method and `K’ for the k-cliquecommunities method. We chose six communities from each algorithm. Table 2 shows frequency of participation, with communities ranked by the third column, which gives the average user participation. This is expressed as a percentage: the percentage of days on which the user was active on Twitter (in our dataset) that they were active in the community. The rightmost five columns show, for each community, how the users’ participation levels break down into five bins. Bins with disproportionately many users in them (i.e. with values more than 0.2) are highlighted in italics. We can see that with the exception of community 4 (weddings), every community has at least a 20 `hard core’ of users, w.Of internal edges (mentions between users), — the number of internal edges (mentions) per node (this gives a measure of how much activity there is inside the community), — the conductance and the weighted conductance of the community within the whole network, — the mean sentiment of edges within the community, using the (MC) measure,6 — whether the community consisted of a single connected component (good candidate communities will of course be connected; however, very infrequently the Louvain method can generate disconnected communities, by removing a `bridge’ node during its iterative refinement of its communities), — the fraction of internal mentions with non-zero sentiment (some of our candidate communities were composed mainly of users speaking a non-English language, and we used this measure to filter them out; tweets in other languages are likely to be assigned a zero sentiment score, because the sentiment scoring algorithm does not find any English words with which to gauge the sentiment),4rsos.royalsocietypublishing.org R. Soc. open sci. 3:…………………………………………This code is freely available from https://sites.google.com/site/findcommunities/.This algorithm, and some refinements to it, are also implemented in the CFINDER program, freely available from http://www. cfinder.org/.6 Owing to time constraints and the large number of tweets involved in community detection, we decided not to calculate the (SS) and (L) scores at this stage.– some statistics summarizing the role played in the community by recently registered users; and — a breakdown of the frequency of participation of users in the community. (For each user in the community, we counted how many distinct days they had been active on Twitter in our data, and then calculated the percentage of these days on which they had posted within the candidate community. We calculated the average across all users in the community, and also split the users up into five bins.) Based on the above statistics, we short-listed a subset of communities and performed a manual inspection of a sample of the tweets within the community, to assess the topics talked about and a visualization of the community, using the program VISONE (http://visone.info/html/about.html) for this subset. In the end, we selected 18 communities to monitor and study. Table 1 shows most of the statistics listed above for these 18 communities, in size order. In each numerical column, the highest six values are highlighted in italics and the lowest six values are highlighted in bold (recall that for conductance and weighted conductance, lower values indicate a more tightly knit community). The `Algorithm’ column contains `L’ for the Louvain method, `W’ for the weighted Louvain method and `K’ for the k-cliquecommunities method. We chose six communities from each algorithm. Table 2 shows frequency of participation, with communities ranked by the third column, which gives the average user participation. This is expressed as a percentage: the percentage of days on which the user was active on Twitter (in our dataset) that they were active in the community. The rightmost five columns show, for each community, how the users’ participation levels break down into five bins. Bins with disproportionately many users in them (i.e. with values more than 0.2) are highlighted in italics. We can see that with the exception of community 4 (weddings), every community has at least a 20 `hard core’ of users, w.