An overwhelming amount of data is created online everyday – on average, in one minute of the day, we send 204 million emails, 2.46 million pieces of content are shared on Facebook, and 277,000 Tweets are made. This incredible amount of data provides organisations an opportunity to tap into what customers are thinking and saying. However, how an organisation can extract and analyse such data poses a significant challenge. Manually searching and interpreting this data would be infeasible and inefficient. This is where text mining comes in.
This week’s post will take a brief look at what text mining is, how organisations can use it, as well as the features offered. Then we take a look at one particular tool – SAS Text Miner – examining its advantages, and disadvantages, as well as key aspects to look out for when conducting text mining.
What is text mining?
Text mining involves the automatic extraction of information from textual documents, uncovering underlying themes and concepts to derive insights. Rather than conducting analysis on pre-structured databases, text mining examines unstructured data, and apply certain rules, in order to convert it into a form of structured data. Examples of such data include emails, research papers, surveys, and web content. Patterns of text are automatically identified from the data as topics and themes, working to define the relationship between terms and phrases.
How organisations can use text mining
Organisations can implement text mining to gain greater insight into their unstructured data. In the digital environment we operate in today, approximately 80% of data is unstructured. Whilst the amount of textual data continues to grow, businesses continually face challenges in understanding and utilising this information. Text mining tools work to address this problem by employing natural language processing to interpret and address jargon, abbreviations, misspellings, and colloquial expression. Typical features offered by text mining tools include extraction of key terms and phrases, conversion of data into structured representations, topic identification, clustering, sentiment analysis, and predictive modelling. While these tools aren’t perfect just yet, text mining enables organisations to extract information out of unstructured data, allowing companies to gain a more holistic view of their operations and customers. For example, businesses are able to analyse comments left by customers on their social media pages to determine customer sentiment, as well as recurring themes and terms. This can allow businesses to more efficiently address sources of dissatisfaction, for example.
SAS Text Miner
With 31 years of experience and more than 43,000 customers worldwide, SAS is considered a leader in Business Intelligence platforms, making them well-poised to provide organisations with the tools to turn data into actionable insights. One of their tools – SAS Text Miner – is used to uncover and extract knowledge from textual documents, and is sold as an add-on to their Enterprise Miner tool.
The benefits of SAS Text Miner are recognised in industry also, and include the following:
- Easy text importing
- Automatic rule generation to be able to understand how the results were derived
- Profiling and trends of terms
- Document theme discovery
- Handles large datasets well
- Integrates well with other tools, with the ability to process a variety of data formats, as well as compatibility on several operating systems
- User-customised and default synonym lists
- Visualisations, such as concept linking diagrams
Challenges of text mining and SAS Text Miner
As previously mentioned, SAS text mining offer a numerous potential benefits for an organisation to extract information from unstructured data sources. However, SAS has received criticism for the steep learning curve involved in comparison to other text analysis tools, such as R and SPSS, as well as the poor graphing capabilities offered. Although SAS Text Miner can process unstructured text fields, lengthy text can result in unclear data processing and add further complexity to analysis. In addition, as SAS Text Miner can not be purchased as a stand-alone product, it may be costly for organisations to invest in this product, particularly for those which do not already run other SAS tools. Whilst Text Miner is a comprehensive tool suitable for large datasets, it may be less so for more straightforward analysis that require only a simple assessment of a small dataset, which may be done faster and cheaper by an experienced employee.
In regards to text mining in general, there may be a potential challenge in retrieving the data (e.g. if from social media sites, rather than internal documents / data), and the subsequent ‘cleaning’ of the data for use by the tool). Clustering may also pose some issues if the data set is highly heterogenous or contains documents with unusual spelling and grammar, as this could produce ‘noisy’ results. Therefore, some preliminary examination and cleaning of the data could help prevent these issues.
Text mining has the potential to add significant value to the data analytics suite of an organisation. Failing to explore the insights that unstructured data has to offer could lead to a competitive disadvantage, as more and more companies are adopting text mining. As always, we want to hear your thoughts on the topic – is text mining overrated? Or do you think the wealth of unstructured data has potentially greater value than the structured data we are used to? Let us know in the comments below!
Arnold, S.E., 2012, Beyond Search-and-Retrieval: Enterprise Text Mining with SAS®,SAS white paper, accessed 13 April 2016, <https://www.arnoldit.com/articles/sas-white-paper.pdf>
Chakraborty, G., 2014. Analysis of unstructured data: Applications of text analytics and sentiment mining. In SAS global forum (pp. 1288-2014)
Hebrank, M., 2010, Developing Keywords Using Automated Clustering of Event Descriptions, Masters thesis, University of North Carolina at Chapel Hill, accessed 12 April 2016, <https://cdr.lib.unc.edu/indexablecontent/uuid:288727d2-ccca-485f-9a87-7fed97316c78>
Kimmorley, S., 2015, ‘Infographic: Here’s how much data is created on the web every minute’, Business Insider Australia, accessed 11 April 2016, <http://www.businessinsider.com.au/infographic-heres-how-much-data-is-created-on-the-web-every-minute-2015-8>
Miner, G., 2012. Practical text mining and statistical analysis for non-structured text data applications. Academic Press.
O’Connor, B., 2009, ‘Comparison of Data Analysis PackagesL R, Matlab, SciPy, Excel, SAS, SPSS, Stata’, AI and Social Science – Brendan O’Connor, accessed 13 April 2016, <https://brenocon.com/blog/2009/02/comparison-of-data-analysis-packages-r-matlab-scipy-excel-sas-spss-stata/>
Predictive Analytics Today, 2014, ‘SAS Text Miner’, Predictive Analytics Today, accessed 11 April 2016, <http://www.predictiveanalyticstoday.com/sas-text-miner/>
Sabherwal, R. and Becerra-Fernandez, I., 2011. Business intelligence: Practices, technologies, and management. John Wiley & Sons.
SAS Community, 2015, ‘%Gettweet: A New SAS Macro to Fetch and Summarize Tweets’, SAS Community, accessed 11 April 2016, <http://www.sascommunity.org/wiki/%25Gettweet:_A_New_SAS_Macro_to_Fetch_and_Summarize_Tweets>
Segall, R.S., Zhang, Q. and Cao, M., 2009, Web-Based Text Mining of Hotel Customer Comments Using SAS® Text Miner and Megaputer Polyanalyst®, SWDSI 2009, pp.141-152.