Machine learning and sentence sentiment

Siri does a lot. But most of what Siri picks up follows some well defined rules.

“Siri, remind me to pick up some milk on the way home.”

There are primary and secondary verbs, as well as words that represent objects and locations. But this type of analysis is the tip of the iceberg in terms of natural language understanding.

A team at Stanford is working on the problem of neural analysis of sentiment.

During the summer the scientists started from a dataset of roughly 12,000 movie review sentences. They split these sentences into phrases, using automated techniques to “parse” groups of words into grammatical units of meaning. The result was 214,000 phrases and sentences. Each of these was read by three humans, who evaluated these expressions for intensity of like or dislike.

Computer scientists call this labeling the data.

Using the Stanford team’s NaSent algorithm, the machine “studied” this labeled data the way a student might study a grammar text.

Or, to be more accurate, the Deep Learning system assigned each labeled expression a set of mathematical attributes. Computer scientists call these numerical descriptions “feature representations.” They are roughly analogous to the concepts and definitions we understand as human beings.

This kind of analysis will move the ball forward, help make natural language systems like Siri much more sophisticated. Fascinating stuff.