A Practical Guide to Big Data for Hotels
Big data has become a big buzzword. Like any buzzword, all of the talk about big data has created big confusion in the marketplace, and it can be easier to tune it out than to take it on. The reality is, whether you want to call it big data or not, there are now new opportunities to take advantage of data to drive decision making and ultimately competitive advantage.
Identifying these opportunities and understanding what to do about them is the challenge facing hotel managers today, particularly in revenue management. It is time for some plain speaking and practical advice about this complex phenomenon.
What is Big Data? A Reminder
Gartner defines Big Data as occurring: When the volume, variety and velocity of data exceeds an organization's storage or compute capacity for accurate and timely decision making (Gartner 3-D Data Management 2001). The reason why big data has become a big deal is not just that we have suddenly have a lot more data, but rather that the technology to capture, store and analyze that data is now not only available, but also accessible.
Innovations in technology have dramatically improved the speed at which data is gathered and processed, and driven down the cost of data storage as well. Big data, therefore, is not a singular thing, but represents a variety of opportunities for organizations to improve business and drive innovation.
Hospitality transactional data sets are by no means as large as an online retailer or a credit card company's might be, but in many cases, they have started to stretch the limits of the legacy technology environment. Reports and analysis are bogged down, and sacrifices are made, both in the storage of data and also the analytics run against it. This is certainly a missed opportunity. However, in my opinion, the biggest challenge that hospitality companies face is in the variety and velocity part of the definition. Useful, even critical, information is coming to us in a variety of new formats, many that we have not had to deal with before, like text data from reviews, click stream from web interactions, or location data. Integrating this unstructured data into a traditional relational database is difficult, if not impossible. Further, much of the data, like tweets and location, is stale nearly as soon as it is created. If you don't have a mechanism in place for taking advantage of these fast moving data sources, opportunities will be missed.
Big Data and Technology
Harnessing big data will require changes to the technology environment, both in data storage and also execution of analytics. In order to handle the volume, variety and velocity of big data, the database needs to be flexible, cheap, scalable and fast. Probably the most well-known answer to the big data storage challenge is Hadoop. Hadoop is an open-source software framework for storing and processing big data in a distributed fashion on large clusters of commodity hardware. Essentially, it accomplishes two tasks: massive data storage and faster processing. I'll talk about the differences between Hadoop and traditional relational databases in terms of the need for data storage to be flexible, cheap, scalable and fast.
Flexible - Relational databases require a pre-defined data schema, which defines not only what data is stored, and the appropriate format of that data, but also how that data relates to the rest of the data in the database. Adding new data or adjusting relationships requires a lot of work. This is ideal for structured data, like sales information or customer profiles, but not so much for unstructured data like text or click-stream data. Unstructured databases, like Hadoop, do not require a pre-defined schema. Without this, any kind of data can be "dumped" into the database, and retrieved again, without needing to change the data schema or impose any "structure" to the data source. It is easy to add a new data source, especially if it happens to be unstructured. You can see where this would be a real advantage, especially when you might not be entirely sure how you want to use that data.
Cheap and Scalable - The reason why organizations have had to be stingy with their data storage is that in the past, adding storage capacity was fairly expensive. We all know that data storage has become exponentially cheaper over the past few years. Many database providers, like Hadoop, are figuring out how to deploy their database on "commodity" hardware, so organizations no longer have to invest in expensive specialty hardware. Clearly, the amount of data that is stored will only grow over time, so databases need to be scalable as well. It must be easy and inexpensive to bolt on more capacity as the organizations data needs grow and expand.
Fast - Data warehousing vendors work hard to make the process of extracting data from source systems, transforming it to match the database structure and loading it into the database (known as ETL) as fast as possible. This also goes for the process of taking data out of the database for reporting or analytics purposes. As data sizes expand and variety increases, pressure is put on the ETL process. The unstructured databases, like Hadoop, are fast in two ways. First, with no pre-defined schema, the process of loading data into the database is much faster. Secondly, Hadoop employs "massive parallel processing" to extract information. Data is stored in a set of smaller "containers" as opposed to one giant container. Data is retrieved from every container at the same time. Think about the difference between one person extracting all the blue M&Ms from one large jar of M&Ms, versus dividing the large jar into ten separate jars and asking ten people to remove the blue M&Ms.
Many companies have put a good deal of effort into organizing structured data, and have made significant investments in data warehouses. There are some data sets that it make sense store in a traditional relational structure, and in fact it can be more efficient to do so. This is why many organizations consider Hadoop, or similar data storage platforms, to be an addition to their existing infrastructure, rather than a replacement for their relational databases.
Big Data Meets Big Analytics
Big data is not useful on its own, but rather, for the insights you can derive from it. For that you need big analytics. Big data has impacted every piece of the analytics value chain, from reporting, quickly rendering reports on billions of rows of data, through advanced analytics like forecasting and optimization, which require complex math executed by multiple passes through the data set.
Analytic processes on big data sets put a strain on processing power, slowing down the time to insight. It's not enough in today's fast moving hospitality environment to push the button and wait hours or days for an answer. It's also not feasible to make a request to IT and wait weeks or months for a report to be developed or a dataset delivered for analysis. Analytics need to be fast and they need to be accessible.
Analytics vendors, like my company, SAS, have been developing new methods for executing analytics more quickly. Below is a high level description of some of these new methodologies, and why they provide an advantage.
Grid Computing and Parallel Processing – Calculations are split across multiple Central Processing Units, or CPUs to solve a bunch of smaller problems in parallel, as opposed to one big problem in sequence. Think about the difference between adding a series of 8 numbers in a row versus splitting the problem into in four sets of two, and handing them out to four of your friends. This is called parallel processing. To accomplish this, multiple CPUs are tied together, making a "grid", so the algorithms can access the processing resources of the entire bank of CPUs.
In-database Processing - Most analytic programs lift data out of the database, execute the "math" and then dump the data sets back in the database. The larger the data sets, the more time consuming it is to move them around. In-database analytics bring the math to the data. The analytics run in the database right on the data sets directly, reducing the amount of time-consuming data movement.
In-memory Processing – This capability is a bit harder to understand for non-technical people, but it provides a crucial advantage for both reporting and analytics. Large sets of data are typically stored on the hard drive of a computer, which is the physical disk inside the computer (or server). It takes time to read the data off of the physical disk space, and every pass through the data adds additional time. It is much faster to conduct analysis and build reports from the computer's memory. Memory is becoming cheaper today, so it is now possible to add enough memory to hold "big data" sets for significantly faster reporting and analytics.
In many cases, leveraging the analytics execution capabilities described above result in significant time savings. Ad hoc reports on billions of rows of data can render in seconds. Large scale optimizations, like risk calculations for banks or price optimization for thousands of retail products across hundreds of stores have gone from hours or days to minutes and seconds. Organizations can run analytics on their entire data set, as opposed to samples or summaries of data. You can see for an analytics-heavy application like pricing and revenue management, how improvements in analytics execution can result in significant improvements in accuracy, and the ability to optimize more frequently as business conditions change.
Responsible Use of Big Data
In a recent article for this publication on Big Data, Big Analytics and Revenue management, I outline some steps to evaluate a new data source and determine how it should fit in the revenue management environment. I will reiterate here that just because you CAN collect a wide variety of datasets today, doesn't mean that you SHOULD. Even though it's become cheaper, managing data still requires resources, and too much data without a clear definition and purpose can become very distracting. It is critical to understand exactly what each data set is (and isn't), what it can be used for by whom, and how decision making will be improved with access to it.
To ensure this responsible use of big data, many organizations are starting to form data councils. These are cross-functional groups whose purpose is to agree on definitions of key metrics, determine the rules of access for datasets and evaluate new data opportunities. This ensures consistency and also that the entire organization can take advantage of big data opportunities.
Examples of tackling big data with big analytics are starting to show up in various industry verticals – including hotels. Here are some examples of how innovative companies are applying big analytics to get value from their big data:
Airline companies incorporate the voice of the customer into their analyses. They mine for passenger sentiment and popular topics across internal and external unstructured text data from social media, forums, passenger surveys, call center logs, and maintenance records. With big text analytics, they can analyze all of their text data, as opposed to small samples, to better understand the passenger experience and improve their service and product offerings.
A major retailer is keeping labor costs down while maintaining service levels by using customer traffic patterns detected by security video to predict in advance when lines will form a the register. This way, staff can be deployed to various stocking tasks around the store when there are no lines, but given enough notice to open a register as demand increases, but before lines start to form.
A major hotel company has deployed a "what if" analysis in their revenue management system which allows users to immediately see the impact of price changes or forecast overrides on their demand, by re-optimizing around the users changes. Revenue managers no longer have to make a change and wait until the overnight optimization runs to fully understand the impact of the change.
Unlocking the insights in big data with big analytics will require making some investments in modernizing technology environments. The rewards for the investment are great. Organizations that are able to use all that big data to improve the guest experience while maximizing revenue and profits will be the ones that get ahead and stay ahead in this highly competitive environment!