How it happened to the average length of tweets?

How it happened to the average length of tweets?

The latest doubling of the limit tweet size offers up a fascinating possibility to take a look at the effects out of a rest regarding size restrictions with the linguistic chatting. And a lot more amazingly, exactly how did CLC impact the structure and you may keyword need for the tweets?

The necessity for a discount out-of expression decreased article-CLC. Ergo, our first hypothesis claims you to article-CLC tweets contain apparently shorter textisms, such as for example abbreviations, contractions, icons, or other ‘space-savers’. On the other hand, we hypothesize that the CLC inspired the newest POS structure of your own tweets, which has had seemingly significantly more adjectives, adverbs, posts, conjunctions, and prepositions. Such POS kinds carry additional information about the problem are revealed, new referential disease; instance features of organizations, the temporal purchase off occurrences, urban centers from incidents otherwise items, and you may causal associations anywhere between situations (Zwaan and you will Radvansky, 1998). So it architectural changes and entails you to definitely phrases might possibly be prolonged, with additional terminology for each and every sentence.

Gligoric ainsi que al. (2018) opposed both before and after-CLC tweets which have a length of everything 140 characters. They learned that pre-CLC tweets in this character variety happened to be relatively a lot more abbreviations and you may contractions, and you may less definite posts. In the current investigation, we put a unique approach you to definitely adds complementary worthy of towards the early in the day findings: i did a material analysis to your good dataset of approximately step 1.5 billion Dutch tweets also all range (we.e., 1–140 and you will step 1–280), in place of seeking tweets in this a specific character range. This new dataset constitutes Dutch tweets that have been composed anywhere between , simply put 2 weeks ahead of as well as 2 weeks shortly after the fresh new CLC.

We performed a standard research to research changes in the number regarding emails, words, sentences, emojis, punctuation scratching, digits, and URLs. To evaluate the first hypothesis, i did token and bigram analyses so you’re able to choose the alterations in the newest relative wavelengths of tokens (i.e., private terms, punctuation scratching, amounts, special letters, and icons) and you may bigrams (i.age., two-keyword sequences). These types of alterations in cousin frequencies you will after that be used to recoup the brand new tokens that were particularly influenced by new CLC. Additionally, an excellent POS data try did to evaluate another hypothesis; which is, perhaps the CLC influenced the POS build of your own phrases. A typical example of per examined POS category was displayed in the Desk step 1.


The content range, pre-running, decimal data, data, token studies, bigram research, and POS analysis was indeed performed using Rstudio (RStudio Party, 2016). The fresh R packages that have been used is actually: ‘BSDA’, ‘dplyr’, ‘ggplot’, ‘grid’, ‘kableExtra’, ‘knitr’, ‘lubridate’, ‘NLP’, ‘openNLP’, ‘quanteda’, ‘R-basic’, ‘rtweet’, ‘stringr’, ‘tidytext’, ‘tm’ (Arnholt and you can Evans, 2017; Benoit, 2018; Feinerer and you can Hornik, 2017; Grolemund and you will Wickham, 2011; Hornik, 2016; Hornik, 2017; Kearney, 2017; R Center People, 2018; Silge and you will Robinson, 2016; Wickham, 2016; Wickham, 2017; Xie, 2018; Zhu, 2018).

Chronilogical age of desire

This new CLC took place towards the within an excellent.m. (UTC). The brand new dataset constitutes Dutch tweets that were authored within a fortnight pre-CLC and two days post-CLC (i.age., out-of 10-25-2017 to help you 11-21-2017). This era is actually subdivided for the month 1, week 2, month 3, and you can few days cuatro (see Fig. 1). To analyze the result of your own CLC we opposed what use within the ‘week step one and day 2′ towards the vocabulary need within the ‘day step three and week 4′. To identify the CLC feeling of absolute-event consequences, a handling investigations is actually invented: the real difference within the code need ranging from few days 1 and few days dos, named Baseline-separated I. Furthermore, the CLC have started a trend from the words utilize you to evolved much more pages became familiar with the newest limit. It pattern might be shown from the researching week step three having times 4, named Standard-split II.

Moving mediocre and you will simple mistake of your profile need through the years, which will show a rise in reputation use blog post-CLC and an extra boost between month step 3 and you can cuatro. Per tick scratching absolutely the beginning of the time (i.age., an excellent.m.). The full time structures suggest the fresh relative analyses: day 1 with week dos (Baseline-split up We), day 3 which have month cuatro (Baseline-split up II), and day step 1 and you can dos with month 3 and you may 4 (CLC)