Skip to main content

Data mapping in the BigQuery integration

Part 2 of the the setup steps for the BigQuery integration

Christian Dreyer avatar
Written by Christian Dreyer
Updated over a year ago

Summary

  • The second stage of setting up the BigQuery integration is configuring data mapping

  • For CRM data, you choose which Planhat data model to map with which BigQuery table or view

    • You also select the sync direction (which can be bidirectional), specify the unique IDs on each side, and set up field mapping between Planhat fields and BigQuery columns

  • For time-series usage data, you choose whether you want to map each BigQuery table or view into Planhat as either User Activities or Custom Metrics

    • You also specify the key (ID) in BigQuery, set up field mapping, and choose the batch size and sync frequency

Who is this article for?

  • Planhat users who are setting up the BigQuery integration (e.g. Tech/Ops)

Series

This article is part of a series on the BigQuery integration:


Article contents


Introduction

This is a technical deep-dive article

Read on if you're an Admin / CS Ops / Tech Specialist etc. and are setting up data mapping in the BigQuery integration in Planhat.

If instead you'd just like an overview of the BigQuery integration, please refer to our main article for the key points.

πŸ“Œ Important to note

This is part 2 of our BigQuery integration setup guide.

Before reading this article, make sure that you have already connected the Planhat BigQuery integration to BigQuery (with "Authentication" in the integration showing green with a tick/check mark), enabled the integration at the top, and selected your database.

If you haven't yet completed those steps, follow part 1 of the setup guide, and come back to this article when your connection is ready.

Now BigQuery and Planhat are connected, we're going to cover how you can configure it for your specific selection of data, and start syncing data between the two systems.

You can choose whether to sync CRM data and/or usage data in the BigQuery integration. This article is divided into these two main parts:

In summary, you will be creating and configuring sync sections, and the main principles for each sync section are:

  • Choose which data model/type in Planhat to map to which column in BigQuery

  • For CRM data, choose the sync direction

  • Specify the IDs/keys used in Planhat and BigQuery

  • Set up field mapping

  • For usage data, choose the sync frequency and batch size

πŸ“Œ Important to note

Please ensure that you do not map a column in your BigQuery table named the same as the table itself. Due to how the data is structured by BigQuery, this prevents us from being able to run the connection.


1. CRM data in the BigQuery integration


The BigQuery integration has broad and flexible CRM syncing capabilities, and you can choose which Planhat data models to sync with BigQuery. Choose from the wide range of standard models, from the main ones such as Company, End User and License, to more specialised models such as Campaign, Objective and Workspace.

You choose which Planhat models to map, and you can map as many or as few as you want.

Each sync section corresponds to 1 Planhat model and 1 BigQuery table (or view), but you can create more sync sections if you would like to map the same model or table/view again - example shown below.

The integration supports both Planhat standard and custom fields. Again, you choose which to map to your choice of BigQuery columns.

πŸš€ Tip

If you set up a sync section for the Conversation model, this includes "Logged Activities" and "Notes". The BigQuery integration doesn't support syncing emails.


How to set up CRM data mapping

Firstly, ensure you have followed all the steps in our article on preparing BigQuery and connecting the integration.

Before you begin data mapping, make sure that you have selected your database. Also ensure the "CRM" tab is selected - this is the default tab when you first open the integration.

Click the blue "Add New Section" button to add a new sync section. Each sync section is for mapping one Planhat model to one BigQuery table or view; simply add another one when you want to map another model / table (or view) combination.

It's easy to set this up. The basic steps are:

  1. Select the Planhat model on the left, and the BigQuery table/view you'd like to connect to it on the right

  2. Choose the overall sync direction - this can be:

    • "Send to Provider" (BigQuery)

    • "Receive from Provider" (BigQuery)

    • "Both Directions"

    • "Not Syncing"

    See here for further details

  3. Select the unique IDs (keys) in Planhat and BigQuery

    • Specify the ID field on the Planhat side (e.g. External ID or Source ID)

    • Choose a suitable column on the BigQuery side - IDs must be unique and text

    • See here for further details

  4. Set up field mapping

    • Map your choice of Planhat fields to your choice of BigQuery columns

    • See here for further details

    • For Planhat models other than Company, configure the reference to Company, by expanding the field mapping row for "Company Id" and selecting the appropriate ID(s) - see here for further details (noting that the linked article was written for Snowflake, but the same principles apply to BigQuery)

  5. Click "Save Section" at the bottom right of the section to save your configuration

... simple! πŸ‘


Sync direction

The BigQuery integration is bidirectional for CRM data, meaning you can choose to push data to, and pull data from, each system.

You set the sync direction independently for each sync section (pair of Planhat data model plus BigQuery table/view), offering a great deal of flexibility.

For each sync section, you can choose from "Send to Provider", "Receive from Provider", or "Both Directions" - or "Not Syncing" means no data will be synced.

It's possible to change the sync direction after your initial configuration - e.g. you could initially select "Receive from Provider" when you're setting up your Planhat tenant, and then change it to "Both Directions" later on.

For data security reasons, regardless of direction, the integration does not delete records (e.g. if you manually delete an End User in Planhat, the integration won't delete the corresponding data in BigQuery), but records can be created and updated.

πŸ“Œ Important to note

If a section is set to "Both Directions", data will first be fetched from BigQuery, and then pushed from Planhat. This order of operations might explain any unexpected results. If you don't get the results you're looking for when you sync, reach out to your TAM or CSM.

πŸ“Œ Important to note

Data cannot be synced from Planhat back to BigQuery views, as views are read-only.


Unique IDs

For each sync section, you will see boxes for IDs under the Planhat model and BigQuery table/view.

The ID (key) on the Planhat side could be Source ID or External ID, or alternatively email address for End User (your customers) and User (your colleagues). These are all fields on the Planhat models, containing a unique ID to specify that particular record (e.g. the Company record for Apple). Note that "Id" in the dropdown box corresponds to the Planhat ID; see the "Important to note" box at the bottom of this section.

For the ID (key) on the BigQuery side, select a suitable column from the dropdown menu. The ID/key must be unique and text. The name of the ID in BigQuery can be different, depending on your own system.

Here are a couple of examples of what this could look like in the integration:

πŸ“Œ Important to note

It's strongly recommended to use Source ID or External ID (which are IDs that have originated outside of Planhat), rather than the Planhat native ID ("Id"), when configuring data mapping in the BigQuery integration. If you select Planhat ID ("Id"), there is a risk of duplicate rows being created in BigQuery.

This is because when a record (e.g. a Company) is synced from Planhat to BigQuery, the integration has to send an instruction to either create it or update it (not both) in BigQuery.

If you have selected Source ID or External ID as your ID (key), the integration has special logic to try to identify whether the record should be created or updated in BigQuery. When a record in Planhat has this field blank, that suggests that it hasn't been synced in from BigQuery, so when it's synced from Planhat to BigQuery, the record is created. When Source ID or External ID is populated in the record in Planhat, that suggests it's synced from BigQuery to Planhat, so when that record is synced from Planhat to BigQuery, it's updated.

However, records always have the native Planhat ID ("Id") populated in Planhat, so it's not possible for the integration to use whether it's empty or not as a way to determine whether it should be created or updated in BigQuery. Therefore, if you sync to BigQuery with "Id" selected, that record will always be created in BigQuery (even if it already exists there).

If you have duplicates in BigQuery, when you use the data there, you'll need to query the data to find the most recent row with that ID.


Field mapping

For each of the sync sections, you can set up field mapping: link up specific Planhat fields (standard or custom fields) with specific BigQuery columns. (Note that the integration does not create fields - it only syncs data between them - so you would need to create any required custom fields first before selecting them in the integration.)

The field mapping for a sync section will look something like this:

This is very simple to set up.

  1. Click the blue "Map a Field" button to add a new field mapping row

  2. Select a Planhat field from the left-hand dropdown menu

    • Selecting a field will automatically detect and display the field type

  3. Choose the sync direction

    • The default is the same sync direction as the parent model/table/view

  4. Select a BigQuery column from the right-hand dropdown menu

    • Selecting a field will automatically detect and display the field type

If you want to delete a field mapping row, simply click the red "x" on the right-hand side.

πŸ“Œ Important to note

Mapping User-type fields

The BigQuery integration supports mapping of Team Member fields, which refer to the Planhat User model. (Note that this is a separate model to End User, which is for customer contacts.)

For example, this includes mapping the "Owner" and "Co-Owner" standard fields on the Company model.

When setting up these field mappings, click the chevron on the right-hand side of the row to expand it and show further configuration. Then, specify what type of unique ID is used to identify the Users, such as "External ID" or "Email". This flexibility enables you to use your choice of ID in BigQuery to associate Companies etc. with Users, rather than just using the User Planhat ID.

Below is an example of field mapping on the Company model - it's a screenshot from the Snowflake integration, but the principle is the same in the BigQuery integration. Because in this case it's set to receive from provider, the "Replace value with when receiving from Snowflake/BigQuery" needs to be defined on the left-hand side; the right-hand side would need to be filled out if sending to Snowflake/BigQuery.

πŸ“Œ Important to note

Mapping Company associations - converting between Company IDs

  • In Planhat, the Company model is at the top of the data structure. Other models (such as End Users) link to Company

  • Although in Planhat this link works via the Company Planhat ID, if you're using the Company Source ID or Company External ID in BigQuery instead, the integration can convert between the ID types for you

  • It's easy to configure in the integration which ID you want to use. You do this via the field mapping row for "Company Id" in each relevant sync section

  • For further details, see this in-depth article - the linked article was written for Snowflake, but the same principles apply for BigQuery

πŸš€ Tip

When mapping the "Involved Contacts" field in the Conversation model, ensure that the mapped column in BigQuery is formatted as an array of external or source IDs, with each cell structured like: ["source_id_1", "source_id_2", ...].


Enabling the CRM data sync

  1. When you're done creating (and saving!) all the sections you want, click the blue "Full sync" button on top right side of the integration (just above the sync sections) to carry out a manual full sync

  2. Enable the toggle switch "activate BigQuery automation synchronization", assuming you would like the integration to run the automatic hourly sync

    • If you don't want CRM data to sync automatically in this way, you can leave the toggle switch off. You can manually run full syncs via the "Full sync" button and/or the arrow buttons on the right-hand side of sync sections

πŸš€ Tip

To confirm that your "Full sync" successfully completed, click "Show Logs" at the top right of the integration. You should be able to see creations and updates with (roughly) the expected number of events. If you have any questions, you can reach out to your TAM or CSM for help.


2. Usage data in the BigQuery integration


The BigQuery integration also supports syncing of usage (time-series) data. This is always one-way: from BigQuery into Planhat.

πŸš€ Tip

There are two types of raw time-series data in Planhat:

  • "User Activities" are actions carried out by specific End Users, e.g. logging in

    • This End User data is automatically rolled up to the Company model, and can also be associated with Assets or Projects

  • "Custom Metrics" are other numerical values you want to track over time, e.g. amount of storage used by a Company

    • Custom Metrics are most commonly sent for the Company model, but alternatively you can choose to have Custom Metrics for the End User, Asset or Project models

Once this usage data has been imported into Planhat, you can easily use Calculated Metrics to convert and combine raw metrics into new, more complex metrics to show exactly what you want and need to know about your customer's usage of your products.

So, for example, you can send in unaggregated logins as a User Activity, and then within Planhat create Calculated Metrics to look at total number of logins in the last 14 days, or the average number of logins per week, and so on.

Before you start mapping usage data, ensure you have followed all the steps in our article on preparing BigQuery and connecting the integration, and have selected your database.

To set up the usage data configuration, make sure you have selected the "USAGE" tab in the integration:


How to set up usage data mapping

Simply click the blue "Add New Section" button to add a new sync section, and then follow the steps to complete the form.

  1. "Data type" - choose from "User Activities" and "Custom Metrics"

    • Remember that:

      • "User Activities" are actions carried out by End Users, e.g. logging in

      • "Custom Metrics" are other numerical values you want to track over time, e.g. amount of storage used by a Company

  2. "Sync direction" - to enable the sync, set this to "Receive from Provider"

    • It's only possible to have usage data sync from BigQuery, not to BigQuery

  3. "Model in BigQuery" - select the table or view in BigQuery containing the data to import

  4. "Batch size" - define the number of rows to be fetched per batch

    • For example, if your BigQuery table contains 100,000 rows, and your batch size is 10,000, Planhat will issue 10 separate sync requests

  5. "Model" - for Custom Metrics only (not User Activities). This is referring to the model in Planhat

    • Single-model mapping - if you are configuring a sync section where every row/event in the BigQuery table/view corresponds to a Custom Metric for the same model in Planhat (i.e. all for the Company model, or otherwise all for End User, or all for Asset, or all for Project), then here you should select the appropriate model from the dropdown menu

    • Multi-model mapping - if you want to map rows/events corresponding to different models in the same sync section, you can leave "Model" blank, and then specify the model via field mapping instead - for more on this, see here

  6. "Run once in" - choose how frequently you want the integration to fetch data from BigQuery. This has the option to fetch every:

    • "5 minutes"

    • "10 minutes"

    • "30 minutes"

    • "1 hour"

    • "Every day at:" - a specific hour of the day - choose from any between 00:00 and 23:00

  7. "Key in BigQuery" - here you specify which BigQuery column we'll use as a unique ID for the incoming rows of values

    • It needs to be unique, numeric, and incrementing (increasing in value) for each new row added - this is so we can identify which rows are new

    • Note that with usage data (as opposed to CRM data), you don't need to specify a key/ID on the Planhat side

  8. "Field mappings" - here you can map User Activity or Custom Metric properties to columns in BigQuery

    • This is easy, as you can choose from dropdown menus

    • Just click the blue "Map a Field" button each time you want to add a new field mapping row

    • For more tips on field mapping for usage data, see here

    • You can also use the Planhat API docs for property details

  9. Once you've completed the form, save your configuration by clicking "Save Section"

  10. If you want to connect another BigQuery table/view to User Activities or Custom Metrics, simply click the blue "Add New Section" button at the bottom, and repeat the steps you've just gone through

  11. The integration will automatically sync in the usage data at the frequency you set. (There is no "Full sync" button for the usage data tab of the integration.) So sit back and watch your usage data sync into Planhat like magic!

πŸ“Œ Important to note

Once you've finished setting up your configuration as described above, you should simply wait for the next sync to start seeing this data sync into Planhat, as is stated in step 11.

The red sync button on each sync section would just reset the incrementing key, meaning at the next sync, Planhat would fetch the entire table or view.

Here are a couple of examples showing what a completed (and saved) sync section can look like:

Click the images to view them enlarged


Field mapping

User Activities

Here we'll give you some tips on which properties on the Planhat side (left-hand column) you should map when setting up a sync section for User Activities. You can also refer to our API documentation for further details of User Activity properties.

As a minimum, you should map:

  • "action" - this is the name of the User Activity, such as "Logged in"

  • "email" or "euExtId" (the External ID of the End User) - these are ways to identify the End User in Planhat

If each row of your table is a count of events, rather than a single event, you will also need to map "count".

Additionally, if you do not want the data to be synced to today's date, you will need to map "date".

Here's a typical example of field mapping for User Activities:

Additionally, if you wish to map User Activities to the Project or Asset models, you will need to map "assetExtId" or "projectExtId" as appropriate - this is the External ID for the relevant model in Planhat.

Custom Metrics

Here we'll give you some tips on which properties on the Planhat side (left-hand column) you should map when setting up a sync section for Custom Metrics. You can also refer to our API documentation for further details of Custom Metric properties.

As a minimum, you should map:

  • "dimensionId" - this is the name of the Custom Metric, such as "Requests used"

  • "value" - this is the numerical value you would like to set

  • "externalId" - this is the External ID corresponding to either:

    • The model you selected in the "Model" box at the top of the sync section, above the field mappings (in single-model mapping), or ...

    • The model specified in that row (data) for the "model" field/property, which you also need to map in the field mappings (in multi-model mapping)

    • For more on single-model mapping v. multi-model mapping, see step 5 in the section above

Additionally, if you do not want the data to be synced to today's date, you will need to map "date".

Here's a typical example of field mapping for Custom Metrics:


Automatic End User creation

In Planhat, it's possible for new End Users to be automatically created from User Activities, if those End Users can be matched with an existing Company in Planhat. This is the case for syncing User Activities via the API or any suitable integration, including BigQuery.

New End Users can be matched with existing Companies via the domain of their email address, or via the Company External ID if stated.

If you would prefer that End Users are not created automatically in this way, speak to your CSM or TAM, and they can disable this feature in your tenant. Note that this applies generically to creating new End Users from User Activities, rather than specifically BigQuery.


Further reading

For more information on field mapping of "Company Id" to use different ID types in BigQuery, see here - the article was written for Snowflake, but the same principles apply to BigQuery.

[Tag for search purposes: Big Query]

Did this answer your question?