Unstructured customer feedback is a goldmine of insights, but analyzing open-ended comments manually is time-consuming and unscalable. A few years ago, I tackled this challenge by creating a solution in Google Sheets which was a great start, but it hit limits as data grew.
So in this solution update, I'm bringing in BigQuery! In this post, I'll walk you through analyzing sentiment at scale without writing a single line of code.
The solution still utilizes Cloud Natural Language and works much like the original: it analyzes the sentiment of feedback by extracting the "entities" (or things people are talking about), assigns a sentiment score (the likelihood the "entity" is being discussed in a positive or negative way), and aggregates the results in a visualization to make these insights more accessible.
Why BigQuery? BigQuery is a cloud data warehouse (part of Google Cloud) that can store an almost unlimited amount of data and allows you to analyze that data using SQL with incredibly fast processing speeds due to its query engine.
So why'd I use BigQuery for this solution?
I'll outline how to perform this sentiment analysis workflow in BigQuery by walking through a few setup steps (which you only need to do once), and then taking you through performing the sentiment analysis itself.
Once you've completed the setup steps, you'll have two functions in BigQuery that you can use on an ongoing basis to access any of the features available in the Translation API and the Natural Language API, including entity sentiment analysis, and use them to analyze your BigQuery data.
BigQuery, Cloud Natural Language, and Cloud Translation are all part of Google Cloud and require a Google Cloud environment and billing enabled.
A note on costs: The Free Tier of Google Cloud covers many of the services used in this workflow, such as storage and querying data in BigQuery and analyzing text in Cloud Natural Language. I ran through this workflow one time using the sample dataset, and my cost was $3.29 (attributed completely to the Cloud Translation API, which does not have a free tier). The cost of Cloud Translation cost can be covered using the Free Trial, or by skipping the translation step in the workflow.
To set up your environment:
This section will take you through loading sample vacation rental review data into BigQuery. After you've tested this solution with the sample data, you can follow the documentation to load your own data into BigQuery, as well, to analyze it with this same workflow.
# Create table of text_content for property 66288
CREATE OR REPLACE TABLE feedback_analysis.reviews_66288 AS (
SELECT
id,
comments AS text_content
FROM `feedback_analysis.reviews`
WHERE listing_id = 66288 )
The completion of this query creates a new table named reviews_66288 within your feedback_analysis dataset that contains 404 rows of reviews for the specified property. The resulting table also includes only the columns we need for analysis, which are the ID of the review and the review content itself (renamed text_content, as this is the naming convention that the ML.UNDERSTAND_TEXT and ML.TRANSLATE functions require to designate the column containing the target text to analyze).Creating a BigQuery Connection is a one-time step that you'll need in order for BigQuery to access the pre-trained models, like Cloud Natural Language, that sit within the Vertex AI product.
Now you're ready to use the BigQuery Connection to create remote models in BigQuery which are direct connections to the pre-trained Natural Language and Translate models in Vertex AI. By creating these remote models, you will then be able to use the ML.TRANSLATE and ML.UNDERSTAND_TEXT functions in your BigQuery queries to use the powerful features of these pre-trained models.
# Create NL Model
CREATE OR REPLACE MODEL `feedback_analysis.vertex_nl_model`
REMOTE WITH CONNECTION `us.bq_conn`
OPTIONS (remote_service_type ='cloud_ai_natural_language_v1');
# Create Translate Model
CREATE OR REPLACE MODEL `feedback_analysis.vertex_translate_model`
REMOTE WITH CONNECTION `us.bq_conn`
OPTIONS (remote_service_type ='cloud_ai_translate_v3');
Once the queries complete, you will see two models in your Explorer panel.Some of the reviews from our dataset are in languages other than English. As part of my original demo, I first translated all non-English reviews before analyzing them with Cloud Natural Language. You can do this in BigQuery using the ML.TRANSLATE function and referencing the translation model feedback_analysis.vertex_translate_model that you created in the last section.
# Translate text_content to English and parse the json
CREATE OR REPLACE TABLE `feedback_analysis.reviews_66288_english` AS (
SELECT
id,
STRING(ml_translate_result.translations[0].detected_language_code) AS `Original Language`,
text_content AS `original_text`,
STRING(ml_translate_result.translations[0].translated_text) AS text_content,
ml_translate_status as `Status`
FROM ML.TRANSLATE(
MODEL `feedback_analysis.vertex_translate_model`,
TABLE `feedback_analysis.reviews_66288`,
STRUCT('translate_text' AS translate_mode, 'en' AS target_language_code)));
The query statement results in a new table reviews_66288_english for which the text_content column now contains the feedback text translated into English. If you are curious, you can learn more about the syntax of this query in the ML.TRANSLATE documentation.
Now it's time to take the translated text_content, and send it to Cloud Natural Language for analysis.
# Analyze sentiment of translated reviews for 66288
CREATE OR REPLACE TABLE feedback_analysis.reviews_66288_results AS (
SELECT * FROM ML.UNDERSTAND_TEXT(
MODEL `feedback_analysis.vertex_nl_model`,
TABLE feedback_analysis.reviews_66288_english,
STRUCT('analyze_entity_sentiment' AS nlu_option)))
This query runs the translated feedback through the Analyze Entity Sentiment feature of Cloud Natural Language and collects the responses into a new table, reviews_66288_results. If you want to learn more about the query syntax, check out the ML.UNDERSTAND_TEXT documentation.
We can see the responses from Cloud Natural Language aren't quite ready to interpret for insights; the entities and their sentiment scores are nested inside JSON objects in the ml_understand_text_result column.
This final step will unpack the entities and their sentiment scores from each JSON response object, and create a new table that has one line per entity mentioned.
#Parse results into one line per entity
CREATE OR REPLACE TABLE feedback_analysis.reviews_66288_results_entities AS (
WITH json_data AS (
SELECT
id,
JSON_EXTRACT(ml_understand_text_result, '$.entities') AS entities_json,
text_content
FROM
`feedback_analysis.reviews_66288_results`
)
SELECT
id,
CAST(JSON_VALUE(mention.text, '$.content') AS STRING) AS mention_content,
CAST(COALESCE(JSON_VALUE(mention.sentiment, '$.magnitude'), '0') AS FLOAT64) AS sentiment_magnitude,
CAST(COALESCE(JSON_VALUE(mention.sentiment, '$.score'), '0') AS FLOAT64) AS sentiment_score
FROM
json_data,
UNNEST(JSON_EXTRACT_ARRAY(entities_json, '$')) AS entity,
UNNEST(JSON_EXTRACT_ARRAY(entity, '$.mentions')) AS mention
)
Executing this query results in a new table called reviews_66288_results_entities, which you can see now has grown from 404 rows of reviews, to almost 6,000 rows of entities which were contained within those reviews, along with their sentiment scores.
Now let's take 6,000 rows of data and visualize it to help interpret and understand the results. BigQuery is integrated with a few visualization tools, including connecting with one-click to Looker Studio and Google Sheets. Since my original demo utilized a Google Sheets chart, we'll follow the same path here.
You can click the Export button and choose Explore with Sheets.
Once the data was connected into Google Sheets, I recommend pulling the full set of data into an extract so that you can work with the full functionality of Sheets (and without incurring any BigQuery costs).
For the below visualization, I further aggregated the data into a pivot table to get an average sentiment score and total number of mentions by entity, and then created a scatter chart to plot each entity on these metrics.
We can immediately see common vacation rental topics such as "location", "noise", and "bed" and understand their average sentiment was likely positive (you can read more about interpreting sentiment score values in the documentation).
Whew! I know that was quite a walkthrough, but now you know how to use BigQuery with Cloud Natural Language and Cloud Translation to analyze open-ended feedback! You have the setup steps complete, so you can continue on your journey and try out...
Once you're done, make sure you shut down your Google Cloud project. While the amount of data you have stored should fit within the BigQuery free tier, it is safest to prevent any further charges.
I'd love to build more low-code/no-code, scalable analysis solutions in the future - let me know if there are any particular challenges you are having with this solution or in your own business!