Data Enrichment

This example illustrates how to perform a sample data enrichment on a document.

Need for Data Enrichment

Querying certain specific types of attribute is not easy. These attributes are not search supportive.

When documents with such attribute undergo mutations, using Couchbase Functions, an additional attribute is appended. However, original attributes are not modified. Subsequently, when the document needs to be retrieved, you can query using the newly appended attribute. The enriched document now becomes search-supportive.

Developing the Handler Code

Create a JavaScript function that contains an OnUpdate handler. This handler listens for data-changes within a source bucket. When a document within this source bucket undergoes mutation, the handler executes a user-defined routine. The handler code converts each IP address (both the beginning and the end of an address-range) to an integer.

In this example, the handler executes the routine, adds two additional fields, and creates a new document in the target bucket. This new document is identical to the original one, except that it has two additional fields. You can query the document using these two additional fields.

A sample handler code for data enrichment is provided.

function OnUpdate(doc, meta) {
  log('document', doc);
  doc["ip_num_start"] = get_numip_first_3_octets(doc["ip_start"]);
  doc["ip_num_end"]   = get_numip_first_3_octets(doc["ip_end"]);
  tgt[meta.id]=doc;
}

function get_numip_first_3_octets(ip)
{
  var return_val = 0;
  if (ip)
  {
    var parts = ip.split('.');

    //IP Number = A x (256*256*256) + B x (256*256) + C x 256 + D
    return_val = (parts[0]*(256*256*256)) + (parts[1]*(256*256)) + (parts[2]*256) + parseInt(parts[3]);
    return return_val;
  }
}

Prerequisites

Before you begin, you’ll need to ensure that:

  • Three buckets, metadata, target and source buckets, are created. See Creating Buckets.

  • A document, Sample Document, is added in the Source bucket. Edit this document and enter the below provided details:

{
   "country": "AD",
   "ip_end": "5.62.60.9",
   "ip_start": "5.62.60.1"
}

Procedure

  1. From the Couchbase Web Console > Eventing page, click ADD FUNCTION, to add a new Function.

  2. In the ADD FUNCTION dialog, for individual Function elements, provide the below information:

    1. For the Source Bucket drop-down, select the source bucket option that was created for this purpose.

    2. For the Metadata Bucket drop-down, select the metadata bucket option that was created for this purpose.

    3. Enter enrich_ip_nums as the name of the Function you are creating in the Function Name text-box.

    4. Enter Enrich a document, converts IP Strings to Integers that are stored in new attributes, in the Description text-box.

    5. For the Settings option, use the default values.

    6. For the Bindings option, specify the target as the name of the bucket, and specify tgt as its associated value.

  3. In the ADD FUNCTION dialog, click Next: Add Code. The enrich_ip_nums dialog appears. The enrich_ip_nums dialog initially contains a placeholder code block. You will substitute your actual enrich_ip_nums code in this block.

    addfunctions ex1
  4. Copy the sample Function handler code and paste it in the placeholder code block of the enrich_ip_nums dialog.

  5. After pasting, the screen appears as displayed below:

    enrich ip nums
  6. Click Save.

  7. To return to the Eventing screen, click Eventing and click on the newly created Function name. The Function enrich_ip_nums is listed as a defined Function.

    deploy enrich ip nums
  8. Click Deploy.

  9. In the Confirm Deploy Function dialog, from the Feed boundary drop-down, select Everything and click Deploy Function. From this point, the defined Function is executed on all existing documents and on subsequent mutations.

  10. To check results of the deployed Function, click the Documents tab.

  11. Select target bucket from the Bucket drop-down.
    As this shows, a version of SampleDocument has been added to the target bucket. It contains all the attributes of the original document, with the addition of ip_num_start and ip_num_end; which contain the numeric values that correspond to ip_start and ip_end, respectively. Additional documents added to the source bucket, which share the ip_start and ip_end attributes, will be similarly handled by the defined Function: creating such a document, and changing any attribute in such a document both cause the Function’s execution.

To summarize, since fetching a document by querying the document’s IP address, was a tedious process. Therefore, using the IP addresses, we enhanced the existing document and added two new attributes. Now, to fetch the document, we queried the cluster using the newly appended attributes, which was easier than the earlier method of querying it based on the IP address.