Smile – you’re on social media

We will be using artificial neural networks in our latest project, SMILE.

We successfully bid to the Arts & Humanities Research Council for a project using Social Media to Identify and Leverage Engagement and now we’ll be working with University of Cambridge Museums and Visual Arts Southwest.

We’re putting our best brains on it: robot ones. “We’ll be using artificial neural networks to analyse social mood,” said i-DAT’s Mike Phillips.
The project extends the work we’ve done in partnership to develop Qualia – our sentiment analysis app that measures arts and culture audience mood and incentivises audiences to leave their emotional feedback.

For SMILE, we’ll be including international expertise and cross disciplinary working, spanning arts technology, communication, sociology and computer science, to deliver new insights about social media analytics and develop an open-source sentiment analysis tool with improved accuracy and ‘calibrated to the arts and culture discourse’.

Introduction

Our implementation is focused on extracting features from the raw data, while taking into account the temporal aspects of the problem. We merge the ideas put forward by DNNs and RNNs, trying for a system that self organises the representation of the data, and accommodates the temporality of language. The system should work as an encoder, transforming and compacting the data in a way that best suits the data. It is important to notice here the freedom of the system in the fact that there has been no intervention or assumptions in the way knowledge should be organised or extracted other than what is imposed by the data.

Our data are formed of Tweets, Geolocation, Timestamps, etc., collected from different festivals around the UK and our goal is to extract interesting features from the data presented. We believe that given the amount of data we have, emergent properties would be useful for the explanation or even provide meaningful insights of how the data could be manipulated. Given that the data are closely correlated with the behaviour of the attendants of the festival, a reverse procedure could influence their behaviour and reaction to events organised by the festival.

Background

Conventional machine learning techniques have limitations in their ability to process raw data.

The implementation of such methods often requires domain expertise and delicate engineering. On the other hand Deep Learning algorithms have shown another way forward. Representation learning allows for the discovery of suitable representations from the raw data.

By passing the data through multiple nonlinear layers, each layer transforms the data to a different representation, having as input the output of the layer below. Due to the the distributed way of encoding the raw input, the multiple representation levels and the power of composition; deep networks have shown promising results in varying applications, and established new records in speech recognition, image recognition.

By pretraining layers like these, of gradually more complicated feature extractors, the weights of the network can be initialised in “good” values. By adding an extra layer of the whole system can then be trained and fine tuned with standard backpropagation. The hidden layers of a multilayer neural network are learning to represent the network’s inputs in a way that makes it easier to predict the target outputs. This is nicely demonstrated by training a multilayer neural network to predict the next word in a sequence from a local context of local words.

When trained to predict the next word in a news story, for example, the learned word vectors for Tuesday and Wednesday are very similar, as are the word vectors for Sweden and Norway. Such representations are called distributed representations because their elements (the features) are not mutually exclusive and their many configurations correspond to the variations seen in the observed data. These word vectors are composed of learned features that were not determined ahead of time by experts, but automatically discovered by the neural network. Vector representations of words learned from text are now very widely used in natural language applications.

Another type of networks that have shown interesting results are Recurrent Neural Networks (RNN). RNNs try to capture the temporal aspects of the data fed to them, by considering multiple time steps of the data in their processing. Thanks to advances in their architecture [9,10] and ways of training them [11,12], RNNs have been found to be very good at predicting the next character in the text [13] or the next word in a sequence [7], but they can also be used for more complex tasks. For example, after reading an English sentence one word at a time, an English ‘encoder’ network can be trained so that the final state vector of its hidden units is a good representation of the thought expressed by the sentence.

Despite their flexibility and power, DNNs can only be applied to problems whose inputs and targets can be sensibly encoded with vectors of fixed dimensionality. It is a significant limitation, since many important problems are best expressed with sequences whose lengths are not known a priori. For example, speech recognition and machine translation.

Method

For the preprocessing of tweets, we worked with unsupervised techniques. For the encoding of the tweets we focused on Natural Language Processing, and used Word Embeddings for the representation of the words in the tweets. This way we can capture linguistic regularities found in our training sentences (festive tweets), placed close together in a high dimensional feature space. In our case the high dimensional feature space, varies between 200 500 dimensions. For the word embeddings we used google’s tools “word2vec”, which provides a fast and reliable implementation of two algorithms, continuous bagofwords and continuous skipgram. [6, 7, 8]

At the same time, using the same library we are able to extract and learn, phrases in our dataset of tweets. This way we are able to identify, ‘san francisco’ and encode it in a single vector, when otherwise, we would have ‘san’ and ‘francisco’ being represented as two vectors. Being an unsupervised method, the above needs a great amount of data, to train properly. The amount of data captured by the qualia api are many, but not quite enough. That said for the training of the word embeddings we use a large corpus, (first billion characters of the latest Wikipedia dump) in addition to the data provided for the qualia api.

To pass the tweets to the network we need to preprocess them, keeping in mind that we need an encoding of a given length/size. We do so by preprocessing the tweets in a RNNRBM [1], a recurrent restricted boltzmann machine. The RNNRBM is an energybased model for density estimation of temporal sequences. This way we are able to maintain information about the temporal relations of words and phrases in tweets. We are also able to find, in the Hidden Layer of our recurrency, a representation of fixed length of our tweet. This representation we want to feed as the encoded version of our tweet together with any other aligned information we have for that event, from that user or at that time.

We hope, given the feature extraction capabilities of the networks, important features of the data will emerge. At the same time given the bidirectional nature of both mechanisms, we will be able to create exemplar objects of the important features extracted.

Given the amount of data needed and the fact that the system should be able to work in real time we also implemented a python api binded to the Qualia v1 api. Being able to get tweets and process them in a parallel fashion, this mechanism provides enough throughput for the algorithm running itself in a massively parallel fashion on GPUs, using Theano accelerated Python scripts.

BoulangerLewandowski, N. (2012). Modeling temporal dependencies in highdimensional sequences: Application to polyphonic music generation and transcription. arXiv Preprint arXiv: …, (Cd). Retrieved from http://arxiv.org/abs/1206.6392 2. Pak, A., & Paroubek, P. (2010). Twitter as a Corpus for Sentiment Analysis and Opinion Mining. LREC, 1320–1326. Retrieved from http://incctps.googlecode.com/svn/trunk/TPFinal/bibliografia/Pak and Paroubek (2010). Twitter as a Corpus for Sentiment Analysis and Opinion Mining.pdf

LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. (1998d). Gradientbased learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324.
E. Hinton and R.R. Salakhutdinov, Reducing the Dimensionality of Data with Neural Networks, Science, 28 July 2006, Vol. 313. no. 5786, pp. 504 507.

CarreiraPerpinan, M., & Hinton, G. (2005). On contrastive divergence learning. … of the Tenth International Workshop on …, 0. Retrieved from http://scholar.google.com/scholar?hl=en&btnG=Search&q=intitle:On+Contrastive+Diverg ence+Learning#0

Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient Estimation of Word Representations in Vector Space. In Proceedings of Workshop at ICLR, 2013.
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. Distributed Representations of Words and Phrases and their Compositionality. In Proceedings of NIPS, 2013.

Tomas Mikolov, Wentau Yih, and Geoffrey Zweig. Linguistic Regularities in Continuous Space Word Representations. In Proceedings of NAACL HLT, 2013.
Hochreiter, S. & Schmidhuber, J. Long shortterm memory. Neural Comput. 9, 1735–1780 (1997).
ElHihi, S. & Bengio, Y. Hierarchical recurrent neural networks for longterm dependencies. In Proc. Advances in Neural Information Processing Systems 8 http://papers.nips.cc/paper/1102hierarchicalrecurrentneuralnetworksforlongtermde pendencies (1995).
Sutskever, I. Training Recurrent Neural Networks. PhD thesis, Univ. Toronto (2012).
Pascanu, R., Mikolov, T. & Bengio, Y. On the difficulty of training recurrent neural networks. In Proc. 30th International Conference on Machine Learning 1310– 1318 (2013).
Sutskever, I., Martens, J. & Hinton, G. E. Generating text with recurrent neural networks. In Proc. 28th International Conference on Machine Learning 1017– 1024 (2011)

Chris Melidis.

Development blog PDF export:

Dr Eric Jensen, Associate Professor in the Department of Sociology at the University of Warwick is a widely published researcher in the field of public engagement and is the Principal Investigator leading this project. Other team-members include co-investigator Dr Maria Liakata, Assistant Professor at the Department of Computer Science at the University of Warwick, Professor Mike Phillips, researcher and i-DAT developer Chris Hunt, Chris Melidis and research consultant Dr David Ritchie, Professor of Communication at Portland State University.

https://warwick.ac.uk/fac/soc/sociology/staff/jensen/ericjensen/smile/workshop/

https://warwick.ac.uk/fac/soc/sociology/staff/jensen/ericjensen/smile/