Abstract:
Millions of online posts about different topics and products are
shared on popular social media platforms. One use of this content is to
provide crowd-sourced information about a specific topic, event, or product.
However, this use raises an important question: what percentage of the
information available through these services is trustworthy? In particular,
might some of this information be generated by a machine, i.e., a ``bot"
instead of a human? Bots can be, and often are, purposely designed to
generate enough volume to skew an apparent trend or position on a topic, yet
the consumer of such content cannot easily distinguish a bot post from a
human post. This paper introduces a new model that uses Bidirectional Encoder
Representations from Transformers (Google Bert) for sentiment classification
of tweets to identify topic-independent features for the social media bot
detection model. Using a Natural Language Processing approach to derive
topic-independent features for the new bot detection model distinguishes this
work from previous bot detection models. We achieve 94\% accuracy classifying
the contents of data set Cresci \cite{cresci-etal-2017-paradigm}as generated
by a bot or a human, where the most accurate prior work achieved an accuracy
of 92\%.