| Need for Dataset |
[May. 31st, 2010|09:41 pm] |
Hi all, I'm trying to perform some research on Livejournal about users' mood in a social network, but right now I have no data. I need a list of users and their entries with the time and mood attached to each journal. Does anyone have such a dataset and can share with me? Please contact me if you can help me. Thank you very much. |
|
|
| lots of dated events involving people |
[Apr. 18th, 2010|10:45 pm] |
http://ljmindmap.com/lj_friending_events.zip
here's a compressed file (winzip from win 7, 1.3 gigs) that contains a sequence of events like this: 2036;actor;Y/N;target each line is a livejournal friending event (adding(Y) or removing(N)), involving an actor and their target. the number is the # of days since Jan 1, 2000. this dataset covers many years but does not cover every user. i have a database of fdata snapshots and a distiller that extracts fdatas and determines the event timeline by interpolating everyone's dated fdata samples. everyone appears to add each other around day 2034 because my data collection started then. this file contains 100+ million events. |
|
|
| A Portrait of Russian LiveJournal - "Formal" Approach" |
[Apr. 16th, 2010|10:56 pm] |
|
A Portrait of Russian LiveJournal ("Formal" Approach") We have attempted an analysis of the Russian LiveJournal. Our analysis was directed by the ideas of the <social network analysis> (SNA). It was based on considering the graph of mutual friends.Our motivation is twofold: - We want to have a structured picture of LJ, and observe its time evolution;
- We want to have a tree of top posts instead of the linear list available at the moment (in other words, we want to have linear lists for each cluster).
The raw data was obtained in the follwoing way.
- A list of first 100 thousands of top-rated bloggers was obtained via the Rating LiveJournal blogs . The number 100,000 is arbitrary. We had chosen it to reduce trafic and computation costs (which are linear in the number of blogs considered);
- For each blog from the above list we got four lists : via API LJ lists of frinds and lists of communities,via API LJ list of interests and via query http://username.livejournal.com/tag list of tags
(see http://www.livejournal.com/bots/ for explanations of the nature of these lists). As an outcome we got : - 919,654 tags;
- 765,407 interests;
- 108636 communities.
The raw data can be represented as a set of graphs including, for example, the graphs (user, user), (user, tag), (user, community), etc, and hypergraphs, for example (user,tag, community), etc. As a next step, we can apply one of the algorithms developed for finding Community_structure in graphs. As a first attempt, we considered undirected graph of mutual friends. The graph we constructed from the raw data contained about 100,000 vertices and 10,050,000 edges. For clustering we used an algorithm of Modularity optimization. This algorithm has a free parameter controlling the Resolution limit. An application of the algorithm yields a partition of the set of graph verteces. This partition generates a (weighted) factor-graph whose veteces are the elements of the partition. The same procedure is applied to the factor-graph. This iterative process stops when we obtain a factor-graph consisting of a single vertex. As a result, we obtain a hierarchical partition of the input graph. In the current version, we obtained approximately 500 clusters (with the number of blogs more than 50) at the bottom level. The depth of the obtained hierarchy tree equals 4. To characterise a cluster, we use its tags ("What the bloggers say"), interests ("interests"), and communitiies ("popular communities") To select representative characteristics, we used the F-measure.. This procedure yields a minimalistic portrait for each cluster. What is the meaning of clusters we obtained? Immediately obvious two types of relations forming some of the clusters. The first one is sharing of a common subject (an interst, a persona, etc.), the second is sharing geography. Another type of clusters discovered is the spam custers (however, not exactly a separate type of relation, because it can be considered as sharing a specific interest). It is possible that there are clusters not representing the above two types of relations. Our clustering is "hard" (in other words, each blogger belongs to a single cluster), which is obviously an oversimplification. We could try to compute probabilities of belonging to each cluster for each blogger. Alternatively, we could use from the outset the algorithms of soft clustering. Our clustering procedure is quite flexible, and it is possible to accomodate within it external (for example, manually obtained) classifgications as particular boundary conditions. Making a preliminary summary: 1. We reveal a structure, it looks reasonable; 2. Structuring top posts requires an additional clusterisation on the level of separate posts. In fact, it can be obtained with the above methods, which we used for clustering bloggers. The only difference is that the procedure should be applied to the bipartite graph (post, word). Evidently, performing this is costly in trafic and in computations (more the former than the latter). Part of the clusterisation is advertised in our blog http://ljstructure.livejournal.com/ |
|
|
| Портрет Русского ЖЖ - формальный анализ |
[Apr. 12th, 2010|02:30 pm] |
|
Предпринята попытка анализа Русского ЖЖ в духе "Социального анализа сетей" на основе социального графа взаимных друзей. Мотиваций было две: 1. Иметь структурнyю картину ЖЖ , наблюдать за ее изменением во времени. 2. Перейти от линейного топа постов к древесному - для каждого кластера-свой топ. Получение исходных данных: - Через Рейтинг блогов LiveJournal получаем список первых 100 тысяч блогеров. Число 100 000 взято с потолка, чтобы ограничить расходы на трафик и вычисления ( которые, заметим линейны по числу блогеров)
- Через API LJ получаем список друзей и сообществ , в которых состоит(читает) конкретный блогер.
- Через API LJ получаем список "интересов" блогера
- Через запрос http://username.livejournal.com/tag получаем список тегов, которые использует блогер.
В итоге получили : - Число тэгов- 919 654
- Число интересов-765 407
- Число сообществ-108 636
Извлеченные данные можно представить в виде набора графов (User-->User), (User-->Tag) ,(User-->Community) или гиперграфов (например (User, Tag, Coommunity)). Далее можно применить один из алгоритмов поиска Community_structure . Для определенности выбрана версия ненаправленного графа "взаимных друзей" . Полученный граф состоит из ~100 000 узлов и ~10 050 000 ребер. В качестве алгоритма разбиения (кластеризации) графа был выбран алгоритм максимизации Modularity . Применяемая версия алгоритма имеет один свободный параметр, позволяющий контролировать так называемый Resolution limit кластеризации. Полученное разбиение порождает фактор-граф ( уже взвешенный ), к которому снова применяется та же процедура. В итоге имеем иерархическое разбиение исходного графа. В текущей версии получено ~ 500 кластеров ( с числом блогеров >=50) на первом уровнe. Глубина иерархического дерева получилась равной 4 . Для характеризации кластера использовались характерные для него тэги ("О чем пишут" ) , интересы ("Интересы") и сообщества ("Популярные сообщества"). В качестве меры "характерности" была использована F-measure . В итоге получаем минималистский пресс-портрет кластера Каков "смысл" полученной кластеризации социального графа ? По крайней мере сразу видно два типа связи, по которому происходит объединение. Это некая общая тема(интерес, личность) и(или) география. Обнаружились и целые спам-кластеры (впрочем последние можно отнести к специфическому интересу). Видимо не все кластеры значимы в предъявленных смыслах. Жесткий характер кластеризации ( каждый блогер принадлежит отдельному классу) конечно условен. Можно говорить о вероятности принадлежности блогера к конкретному кластеру ( например через относительное число друзей в других кластерах) или сразу применяя алгоритмы 'soft clustering" . Сама процедура кластеризации достаточно гибка и позволяет добавлять внешниe (ручные) классификации как специфические граничные условия. Если подвести первые итоги, то: 1. Структура выявляется и выглядит разумной. 2. Структуризация топов - требует еще одной кластеризации - уже на уровне постов. На самом деле ее можно провести совершенно аналогично, применяя описанную выше процедура к двудольному графу - (Post-->Слово) . Однако это требует гораздо больших затрат на трафик (в основном) и вычисления. Части кластеризации - можно посмотреть в журнале http://ljstructure.livejournal.com/ |
|
|
| Sociology and online activities survey |
[Oct. 28th, 2009|12:53 pm] |
I am laying the ground work for a sociology paper. I would like to focus on LJ but unless I get enough responses I will have to stay more general. I will have a few levels of surveys to do over time built upon the answers to the survey before, you do not have to fill them all out. If you have time now thought, could you please answer the first survey to help me out?
Click Here to take survey
Thank you in advance, Sam |
|
|
| Children's bad behavior - is it the parents fault |
[Oct. 13th, 2009|11:17 am] |
|
Hello all. I am doing research for an article about the behavior of today's generation of children vs. past generations. There is a higher degree of "bad" behavior, narcissism and lack of maturity and independence as they leave their teen years. What is the cause of this trend? Is it our overall society or a shift in parenting style that puts the child’s wants before their needs and the child in the driver seat rather than the parent? |
|
|
| Long shot... TOS versions? |
[Jun. 22nd, 2009|04:35 pm] |
This is a long shot request i know but... is there anyone out there (or do you know of anyone) who has kept track of all the various versions of LJ's TOS? i'm curious to track revisions in it. i'd be happy to make up a semi-master document tracking all revision changes if anyone would be interested in it, too. |
|
|
| Online Survey, relations between LJ, Inc. and LJ users |
[Jun. 18th, 2009|01:26 pm] |
(This has been crossposted a few times, my apologies if you’re seeing this several times.)
I’m currently doing research on LJ for my master’s thesis in applied anthropology. Specifically I'm looking at relations between LJ, Inc. and LJ members; where the relationship breaks down, how one side sees the other, questions of ownership on the site, profitability/ads, etc.
To that end, I have a survey I’d like to have as many LJ-ers take so I can get a good idea of how the userbase feels about these issues. The survey is here, and it should take a maximum of 10 or 15 minutes to fill out. I am posting this to several communities; if you know of a community I should post this in, either comment here and let me know or feel free to repost this message in its entirety to that community (or to your own LJ!).
Thanks so much for your help!! |
|
|
| LJ visualization |
[Apr. 7th, 2009|10:34 pm] |
Hi all,
Recently I was doing some research in the field of drawing scale-free graphs. One of the results is graph drawing algorithm that is capable to draw graphs with several millions of nodes. I used this algorithm to build visualization of livejournal friendship graph. ( Read more... ) |
|
|
| navigation |
| [ |
viewing |
| |
most recent entries |
] |
| [ |
go |
| |
earlier |
] |
| |
|
|