Analyzing reviews and feedback at scale

aliciamw · 02-22-2024 06:20 AM

Unstructured customer feedback is a goldmine of insights, but analyzing open-ended comments manually is time-consuming and unscalable. A few years ago, I tackled this challenge by creating a solution in Google Sheets which was a great start, but it hit limits as data grew.

So in this solution update, I'm bringing in BigQuery! In this post, I'll walk you through analyzing sentiment at scale without writing a single line of code. 1 feedback-sentiment-summaryimage.png

The solution still utilizes Cloud Natural Language and works much like the original: it analyzes the sentiment of feedback by extracting the "entities" (or things people are talking about), assigns a sentiment score (the likelihood the "entity" is being discussed in a positive or negative way), and aggregates the results in a visualization to make these insights more accessible.

Analyzing feedback at scale - with BigQuery

Why BigQuery? BigQuery is a cloud data warehouse (part of Google Cloud) that can store an almost unlimited amount of data and allows you to analyze that data using SQL with incredibly fast processing speeds due to its query engine.

So why'd I use BigQuery for this solution?

BigQuery is integrated: Cloud Natural Language (as well as other Vertex pre-trained models) is available directly within BigQuery for analyzing data without writing any code - just a SQL statement.
BigQuery is fast: Analyzing just over 400 vacation property reviews using BigQuery took 98 seconds versus 301 seconds using my original solution built with Apps Script.
BigQuery is scalable: Apps Script's execution time limit is currently 360 seconds, meaning the 400 rows of test data puts the original solution at its limit, while BigQuery can send over 100,000 rows to Cloud Natural Language for analysis (with standard quota).

Step-by-step sentiment analysis in BigQuery

I'll outline how to perform this sentiment analysis workflow in BigQuery by walking through a few setup steps (which you only need to do once), and then taking you through performing the sentiment analysis itself.

Setup steps (one-time)

Set up your Google Cloud environment: this includes creating a project and enabling APIs required for the workflow
Load sample data into BigQuery: this step walks you through uploading local CSV data into a BigQuery table (using the same dataset from my original demo)
Create a BigQuery Connection: this connection is required for BigQuery to access the translation and sentiment models
Create Natural Language and Translation functions: this step creates functions in BigQuery that directly access the Natural Language (ML.UNDERSTAND_TEXT) and Translation (ML.TRANSLATE) models

Once you've completed the setup steps, you'll have two functions in BigQuery that you can use on an ongoing basis to access any of the features available in the Translation API and the Natural Language API, including entity sentiment analysis, and use them to analyze your BigQuery data.

Sentiment analysis steps (ad-hoc)

Translate feedback text to English: this step uses the Translation function in BigQuery (ML.TRANSLATE)
Analyze sentiment of text data: this step uses the Cloud Natural Language function in BigQuery (ML.UNDERSTAND_TEXT) to process translated feedback for sentiment analysis
Format sentiment analysis response: this step uses SQL to transform the raw results sent from Cloud Natural Language in order to enable analysis
Visualize results for insights: this step exports the transformed data to Google Sheets to summarize and visualize the results

Set up Google Cloud environment

BigQuery, Cloud Natural Language, and Cloud Translation are all part of Google Cloud and require a Google Cloud environment and billing enabled.

A note on costs: The Free Tier of Google Cloud covers many of the services used in this workflow, such as storage and querying data in BigQuery and analyzing text in Cloud Natural Language. I ran through this workflow one time using the sample dataset, and my cost was $3.29 (attributed completely to the Cloud Translation API, which does not have a free tier). The cost of Cloud Translation cost can be covered using the Free Trial, or by skipping the translation step in the workflow.

To set up your environment:

Create a Google Cloud project and ensure billing is enabled. (These instructions are a nice, concise view of these processes, as well.)
Click this link to enable the services in your project that are required for this solution: BigQuery API, BigQuery Connection API, Cloud Translation API and the Cloud Natural Language API.

Load sample data into BigQuery

This section will take you through loading sample vacation rental review data into BigQuery. After you've tested this solution with the sample data, you can follow the documentation to load your own data into BigQuery, as well, to analyze it with this same workflow.

Navigate to the BigQuery Console
Create a dataset by clicking on the three dots next to your project id in the Explorer panel, and selecting Create dataset.
Name the dataset feedback_analysis.
Download the sample vacation property reviews dataset from Kaggle. Expand the zip file and save the reviews.csv file to your desktop.
Create a new table within the dataset by clicking on the three dots next to the dataset name in the explorer panel, and selecting Create table.
Complete the Create table page as outlined in the screenshots below (substituting your own project id in the relevant field) and then click the Create Table button:
Highlight your new table in the explorer panel, and you can see details on the table including a preview of data.
Instead of analyzing all of the properties, this demo will only analyze the reviews of one property (listing_id = 66288), so let's create an intermediary table with just the data we want to further analyze. Click the + button to open a new tab.

Copy and paste the following SQL statement into the query editor. Then click Run.
```
# Create table of text_content for property 66288
CREATE OR REPLACE TABLE feedback_analysis.reviews_66288 AS (
  SELECT
    id,
    comments AS text_content
  FROM `feedback_analysis.reviews`
  WHERE listing_id = 66288 )
```
The completion of this query creates a new table named reviews_66288 within your feedback_analysis dataset that contains 404 rows of reviews for the specified property. The resulting table also includes only the columns we need for analysis, which are the ID of the review and the review content itself (renamed text_content, as this is the naming convention that the ML.UNDERSTAND_TEXT and ML.TRANSLATE functions require to designate the column containing the target text to analyze).

Create a BigQuery connection

Creating a BigQuery Connection is a one-time step that you'll need in order for BigQuery to access the pre-trained models, like Cloud Natural Language, that sit within the Vertex AI product.

Go to the BigQuery page.
Click add Add data, and then click Connections to external data sources.
In the Connection type list, select BigLake and remote functions (Cloud Resource).
In the Connection ID field, enter bq_conn as a name for your connection.
Click Create connection.
Click Go to connection.
In the Connection info pane, copy the service account ID for use in a later step.
Head to the IAM page.
Click Grant Access and enter the service account ID from step 7 in the New principals box.
From the Assign Roles drop-down menu, choose these three roles:
- Service Usage Consumer
- BigQuery Connection User
- Cloud Translation API User
Click Save, and your service account now has all the permissions it needs!

Create Natural Language and Translation models in BigQuery

Now you're ready to use the BigQuery Connection to create remote models in BigQuery which are direct connections to the pre-trained Natural Language and Translate models in Vertex AI. By creating these remote models, you will then be able to use the ML.TRANSLATE and ML.UNDERSTAND_TEXT functions in your BigQuery queries to use the powerful features of these pre-trained models.

Go back to the BigQuery page.
Click Create SQL query.

Copy and paste the following code block containing two SQL statements into the query editor and click Run.

# Create NL Model
CREATE OR REPLACE MODEL `feedback_analysis.vertex_nl_model`
 REMOTE WITH CONNECTION `us.bq_conn`
 OPTIONS (remote_service_type ='cloud_ai_natural_language_v1');

# Create Translate Model
CREATE OR REPLACE MODEL `feedback_analysis.vertex_translate_model`
 REMOTE WITH CONNECTION `us.bq_conn`
 OPTIONS (remote_service_type ='cloud_ai_translate_v3');

Once the queries complete, you will see two models in your Explorer panel.

14 models_explorer.png

Translate text data to English

Some of the reviews from our dataset are in languages other than English. As part of my original demo, I first translated all non-English reviews before analyzing them with Cloud Natural Language. You can do this in BigQuery using the ML.TRANSLATE function and referencing the translation model feedback_analysis.vertex_translate_model that you created in the last section.

Click the + button to open a new query editor tab.

Copy and paste the following SQL statement into the query editor and click Run.

# Translate text_content to English and parse the json
CREATE OR REPLACE TABLE `feedback_analysis.reviews_66288_english` AS (
  SELECT
    id,
    STRING(ml_translate_result.translations[0].detected_language_code) AS `Original Language`,
    text_content AS `original_text`,
    STRING(ml_translate_result.translations[0].translated_text) AS text_content,
    ml_translate_status as `Status`
  FROM ML.TRANSLATE(
    MODEL `feedback_analysis.vertex_translate_model`,
    TABLE `feedback_analysis.reviews_66288`,
    STRUCT('translate_text' AS translate_mode, 'en' AS target_language_code)));

The query statement results in a new table reviews_66288_english for which the text_content column now contains the feedback text translated into English. If you are curious, you can learn more about the syntax of this query in the ML.TRANSLATE documentation.

15 review_66288_english_table.png

Analyze sentiment of text data

Now it's time to take the translated text_content, and send it to Cloud Natural Language for analysis.

Click the + button to open a new query editor tab.
Copy and paste the following SQL statement into the query editor and click Run.
```
# Analyze sentiment of translated reviews for 66288
CREATE OR REPLACE TABLE feedback_analysis.reviews_66288_results AS (
SELECT * FROM ML.UNDERSTAND_TEXT(
  MODEL `feedback_analysis.vertex_nl_model`,
  TABLE feedback_analysis.reviews_66288_english,
  STRUCT('analyze_entity_sentiment' AS nlu_option)))
```
This query runs the translated feedback through the Analyze Entity Sentiment feature of Cloud Natural Language and collects the responses into a new table, reviews_66288_results. If you want to learn more about the query syntax, check out the ML.UNDERSTAND_TEXT documentation.

16 review_66288_results_table.png

We can see the responses from Cloud Natural Language aren't quite ready to interpret for insights; the entities and their sentiment scores are nested inside JSON objects in the ml_understand_text_result column.

Format sentiment analysis response

This final step will unpack the entities and their sentiment scores from each JSON response object, and create a new table that has one line per entity mentioned.

Click the + button to open a new query editor tab.

Copy and paste the following SQL statement into the query editor and click Run.

#Parse results into one line per entity
CREATE OR REPLACE TABLE feedback_analysis.reviews_66288_results_entities AS (
WITH json_data AS (
  SELECT
    id,
    JSON_EXTRACT(ml_understand_text_result, '$.entities') AS entities_json,
    text_content
  FROM
    `feedback_analysis.reviews_66288_results`
)
SELECT
  id,
  CAST(JSON_VALUE(mention.text, '$.content') AS STRING) AS mention_content,
  CAST(COALESCE(JSON_VALUE(mention.sentiment, '$.magnitude'), '0') AS FLOAT64) AS sentiment_magnitude,
  CAST(COALESCE(JSON_VALUE(mention.sentiment, '$.score'), '0') AS FLOAT64) AS sentiment_score
FROM
  json_data,
  UNNEST(JSON_EXTRACT_ARRAY(entities_json, '$')) AS entity,
  UNNEST(JSON_EXTRACT_ARRAY(entity, '$.mentions')) AS mention
)

Executing this query results in a new table called reviews_66288_results_entities, which you can see now has grown from 404 rows of reviews, to almost 6,000 rows of entities which were contained within those reviews, along with their sentiment scores.

17 review_66288_results_entities_table.png

Visualize results for insights

Now let's take 6,000 rows of data and visualize it to help interpret and understand the results. BigQuery is integrated with a few visualization tools, including connecting with one-click to Looker Studio and Google Sheets. Since my original demo utilized a Google Sheets chart, we'll follow the same path here.

You can click the Export button and choose Explore with Sheets.

Once the data was connected into Google Sheets, I recommend pulling the full set of data into an extract so that you can work with the full functionality of Sheets (and without incurring any BigQuery costs).

For the below visualization, I further aggregated the data into a pivot table to get an average sentiment score and total number of mentions by entity, and then created a scatter chart to plot each entity on these metrics.

19 sheets_chart.png

We can immediately see common vacation rental topics such as "location", "noise", and "bed" and understand their average sentiment was likely positive (you can read more about interpreting sentiment score values in the documentation).

Next steps

Whew! I know that was quite a walkthrough, but now you know how to use BigQuery with Cloud Natural Language and Cloud Translation to analyze open-ended feedback! You have the setup steps complete, so you can continue on your journey and try out...

Loading your own data into BigQuery and analyzing it in a similar way.
Using other features of Cloud Natural Language to analyze text, including analyzing sentiment at a sentence/document level, extracting entities from text, and classifying text.
Discovering the other models that integrate with BigQuery to help you analyze audio files, images, documents, and other types of unstructured data.

Once you're done, make sure you shut down your Google Cloud project. While the amount of data you have stored should fit within the BigQuery free tier, it is safest to prevent any further charges.

I'd love to build more low-code/no-code, scalable analysis solutions in the future - let me know if there are any particular challenges you are having with this solution or in your own business!

ellmag1

When going through this guide I get the error "Table-valued function not found: ML.translate
Running this code:

SELECT * FROM ML.TRANSLATE(

MODEL `project_name.x_EM_dev.translate_v3`,

(SELECT FaultDesc2 AS text_content from `project_name.x_EM_dev.si_ex_orders_test`),

STRUCT('DETECT_LANGUAGE' AS translate_mode));

Why would this error occour?

aliciamw

@ellmag A few things to check:

-Are your connection and the `x_EM_dev` dataset both in either the US or EU multiregion? This is required for ML.TRANSLATE.

-Try running through the documentation tutorial for ML.TRANSLATE and see if perhaps you lack a required permission, or if you missed a role added for the service account: https://cloud.google.com/bigquery/docs/translate-text

Let me know what you find out, especially if there's something I can update in the article!

ellmag2

Hi,

Thank you for the response. The error magically disappeared after a couple of hours, and now everything works. My best guess is that it had to do with latency when enabling the Cloud Translate API or something similar.

With this in mind; it might be worthwhile to add to the guide if someone else experiences the same.

Cheers,
Elliot