Indexing Integration

Overview

Integrating Twiggle’s API into your indexing process adds semantic understanding of your catalog data to your Solr index. This is required in order for catalog listings that match your users’ search queries to be available for retrieval via the Twiggle API.

Twiggle's semantic models are continually enhanced to comply with emerging search patterns. To ensure consistency between the semantic models used to understand your catalog data and your users' search queries, it is essential that you process your entire catalog through the /listings endpoint at least once a week. You may also use the /listings endpoint for ongoing catalog updates and additions.

To integrate the Twiggle API in your indexing process, your indexing logic should include the following steps:

  1. Extract listing features
    Send your catalog data as batches of Listing objects — the Twiggle API’s input objects for raw listing data — through POST /listings calls. The API's responses will include batches of Features objects — one object per each listing in the request. These objects are structured representations of product features extracted by the Twiggle API from the listing data in the request.
  2. Index listing features in Solr
    Embed Features objects into search documents and index them in Solr. This will require a one-time update to your search engine schema configuration.

Please Note: Integrating the Twiggle API as part of your indexing process will increase the overall index size. Capacity planning and resource allocation should be considered to avoid performance degradation.

Extract Listing Features

To extract listing features from your product catalog, you'll need to send raw listing data (as currently stored in your catalog) to the Twiggle API through POST /listings requests. The API will then return structured product features which you then index in your search engine. The /listings endpoint accepts listing data in the form of Listing objects. This data may include product attributes (material, size, color, etc.) and listing metadata (identifiers, price, etc.). In response, the Twiggle API will return Features objects, which include structured representations of the features extracted by the Twiggle API from corresponding Listing objects. These objects should then be embedded into your listings' search engine documents and indexed as part of them.

Set the Engine Version

Whenever you send a POST /listings request, you need to declare which Engine version you are using. If you are performing a full catalog indexing cycle, always use the latest Engine version. Get the latest Engine version by calling the /versions endpoint:

GET https://<your_partner_id>.api.twiggle.com/v1.6/versions
latest Engine version will appear as the value of the field latest_engine_version in the response:

{
 "latest_engine_version": "201712111017",
 "released_at": "1508852896"
}

Store this engine version, so that you can use it for query interpretation requests, as well as on-going indexing of new or updated listings, after completing the full catalog indexing cycle.

Read more on versioning here.

Send Listings, Get Features

Once you know which Engine version to use, you can proceed to sending batches of listings through POST /listings requests.

Pass the Engine version through the engine_version query parameter:

Parameter NameRequired?Data TypeDescription

engine_version

Required

String

The Engine version that should be used to extract Features objects.

In addition, you need to include a request body of type application/json with a batch of Listing objects:

Parameter NameRequired?Data TypeDescription

listings

Required

Array[`Listing`]

A batch of Listing objects. See the full specification of the Listing object here.

Example of an HTTP POST request to the /listings endpoint:

POST https://<your_partner_id>.api.twiggle.com/v1.6/listings?engine_version=20171211101
{
  "listings": [
    {
      "listing_id": "9c398fh",
      "parent_id": "3ljhe07",
      "title": "Lace-up Floral Dress",
      "description": "A stretch-knit dress featuring a lace-up V-neckline, long sleeves, and an allover floral pattern.",
      "images": [
        {
          "uri": "http://www.mycompany.com/images/productimage.jpg"
        }
      ],
      "category": {
        "id": "SA0287X9",
        "name": "Dresses",
        "path": [
          {
            "id": "ME4292K8",
            "name": "Fashion"
          },
          {
            "id": "EM425L24",
            "name": "Women"
          },
          {
            "id": "SA0287X9",
            "name": "Dresses"
          }
        ]
      },
      "brand": "Forever 21",
      "attributes": [
        {
          "attribute": {
            "id": "AT697368",
            "name": "Color"
          },
          "value": "Rust"
        },
        {
          "attribute": {
            "id": "AT845639",
            "name": "Size"
          },
          "value": "S"
        }
      ],
      "price": {
        "amount": 25,
        "currency": "USD"
      }
    }
  ]
}

A successful request will yield a response with status code 200 and a body of content type application/json which contains the following fields:

Field NameData TypeDescription

_id

String

The unique identifier of the response.

engine_version

String

The Engine version that was used to extract Features objects.

listings

Array[Object]

A batch of objects that correspond to the batch of listings in the request. Each object contains the following fields:

  • listing_id (String): The unique ID of the listing to which this object corresponds.
  • parent_id (String): The listing's parent ID, in case the listing is one of several product variants.
  • features (Object): The listing's Features object, which should be indexed in your search engine.

Example of a response body for the HTTP POST request to the /listings endpoint:

ElasticSearch Solr
{
  "_id": "22117132ea3711e788a842010a81060b",
  "engine_version": "201712111017",
  "listings": [
    {
      "listing_id": "9c398fh",
      "parent_id": "3ljhe07",
      "features": {
        "$concept_sm": [
          "Dress",
          "ApparelProduct",
          "FashionProduct"
        ],
        "$color_sm": [
          "RustColor",
          "WarmColor",
          "RedBrownColor",
          "Brown",
          "Red"
        ],
        "$persona_sm": [
          "Woman",
          "Adult"
        ],
        "fastener_om.$concept_sm": [
          "LaceUp"
        ],
        "apparel_sleeves_o": {
          "sleeve_length_o": {
            "$concept_sm": [
              "LongSleeve"
            ]
          }
        },
        "$brand_sm": [
          "forever21_brand"
        ],
        "stitching_level_sm": [
          "stitched"
        ],
        "design_patterns_om": [
          {
            "$concept_sm": [
              "Floral",
              "Pattern"
            ]
          }
        ],
        "apparel_neck_o": {
          "apparel_neck_shape_om": [
            {
              "$concept_sm": [
                "VNeck"
              ]
            }
          ]
        },
        "size_symbol_sm": [
          "S"
        ]
      }
    }
  ]
}
{
  "_id": "22117132ea3711e788a842010a81060b",
  "engine_version": "201712111017",
  "listings": [
    {
      "listing_id": "9c398fh",
      "parent_id": "3ljhe07",
      "features": {
        "$concept_twg_sm": [
          "Dress",
          "ApparelProduct",
          "FashionProduct"
        ],
        "$color_twg_sm": [
          "RustColor",
          "WarmColor",
          "RedBrownColor",
          "Brown",
          "Red"
        ],
        "$persona_twg_sm": [
          "Woman",
          "Adult"
        ],
        "fastener_om.$concept_twg_sm": [
          "LaceUp"
        ],
        "apparel_sleeves_o.sleeve_length_o.$concept_twg_sm": [
          "LongSleeve"
        ],
        "$brand_twg_sm": [
          "forever21_brand"
        ],
        "stitching_level_twg_sm": [
          "stitched"
        ],
        "design_patterns_om.$concept_twg_sm": [
          "Floral",
          "Pattern"
        ],
        "apparel_neck_o.apparel_neck_shape_om.$concept_twg_sm": [
          "VNeck"
        ],
        "size_symbol_twg_sm": [
          "S"
        ]
      }
    }
  ]
}

Error Handling

Partially Successful Requests

In case the Twiggle API fails to extract Features objects fails for some—but not all—listings in a batch POST /listings request, the API will return a response with status code 200 and a body which includes an error object for each of the failed listings. Error objects will appear next to objects of successfully processed listings in the listings array. Each error object will include the ID of the failed listing, the error type, and the error description.

For example, in the following request, you can see one valid listing and one failed listing due to lack of input in the title field:

POST https://<your_partner_id>.api.twiggle.com/v1.6/listings?translate=ture
{
  "listings": [
    {
      "listing_id": "1",
      "title": "Lace-up Floral Dress",
      "images": [
        {
          "uri": "http://www.mycompany.com/images/1.jpg"
        }
      ],
      "category": {
        "id": "SA0287X9"
      },
      "price": {
        "amount": 25,
        "currency": "USD"
      }
    },
    {
      "listing_id": "2",
      "images": [
        {
          "uri": "http://www.mycompany.com/images/2.jpg"
        }
      ],
      "category": {
        "id": "EK245M23"
      },
      "price": {
        "amount": 100,
        "currency": "USD"
      }
    }
  ]
}

The first listing will be processed successfully and the response would include a Features object for it. The second listing is missing the title field, and will therefore fail with an appropriate error message. This would lead to a response with status code 200 and the following body:

{
  "_id": "858d69035a28405fb878c6a5579392fa",
  "engine_version": "201712111017",
  "listings": [
    {
      "listing_id": "1",
      "features": {
        "features.$concept_twg_sm": [
          "Dress",
          "ApparelProduct",
          "FashionProduct"
        ],
        "features.fastener_om.$concept_twg_sm": [
          "LaceUp"
        ],
        "features.design_patterns_om.$concept_twg_sm": [
          "Floral",
          "Pattern"
        ]
      }
    },
    {
      "listing_id": "2",
      "error": {
        "type": "schema_validation",
        "content": {
          "title": [
            "Missing data for required field."
          ]
        }
      }
    }
  ]
}

In case of a partially successful response, you should proceed to index the Features objects of successful listings. Listings which yielded errors require offline inspection.

Throughput Auto-scaling

The /listings endpoint supports high request throughput rates to enable high speed indexing cycles for your entire catalog. Request throughput rates for ongoing indexing of new or updated listings are typically significantly lower than those required for full catalog indexing. When you initiate a full catalog indexing cycle and increase your request throughput rate, the Twiggle API responds by automatically scaling up the throughput rates supported by the /listings endpoint. During this scale-up process, requests may temporarily fail until new servers are functional. This will result in a response with status code 429 and the following error message:

{
 "code": 429,
 "message": "Too many requests. Please try again soon."
}

To assure proper feature extraction coverage for your catalog, it is important that you implement a simple handler which retries sending POST /listings requests whenever you receive a response with status code 429. The scale-up process typically occurs once during a full catalog indexing cycle, and can take up to 4 minutes.

Other API Errors

POST requests to the /listings endpoint may also result in any of the following errors:

Error TypeStatus CodeDescription
Bad Request400Invalid request body.
Unauthorized Request401Missing or incorrect authentication credentials.
Internal Server Error500Server error on the Twiggle API end. Try resending the request.

Timeouts

We recommend preconfiguring a timeout limit to POST /listings requests. The limit should account for the number of listing IDs you include in your requests, as well as your desired indexing completion time. In case of a timeout, you may attempt to resend the request.

Minimizing Indexing Time

The time length that the Twiggle API adds to a full catalog indexing cycle is influenced by two factors:

  1. The number of listings
    The larger the batch you include in a batch in POST /listings requests, the quicker indexing will be. Note that you should adjust your request timeout limit to match the batch size.
  2. The number of POST /listings requests sent concurrently
    The more requests you send in parallel, the quicker indexing will be.

As part of the integration, Twiggle will configure the API to support your desired batch size and number of concurrent requests.

Index Feature Fields

Features objects are structured JSON objects containing multiple fields that represent product features as extracted by the Twiggle API. These fields are named Feature Fields, and their primary purpose is to be indexed along with your existing product data. This will enable you to match listings to Twiggle API query interpretations. In order to index Feature Fields, you'll need a one-time update to your search engine schema configuration. Once this is done, you can proceed to embed Feature Fields inside your regular search documents, which will be sent to your search engine for indexing.

Configure Your Search Engine Schema

In order to index search documents with Feature Fields, we need to update your search engine schema with the appropriate configuration. Because of the dynamic nature of Twiggle's knowledge graph, Feature Fields can potentially be added or modified between Engine versions. Therefore, we can't create an up-front schema for all possible Feature Fields names. Instead, we use Solr's Dynamic Fields or ElasticSearch’s Dynamic Templates to create a predefined data type naming convention, where each data type gets a Dynamic Field pattern that indicates how it should be mapped. These patterns pick up new Feature Fields as they are encountered during indexing and instruct the search engine in question to index them according to the defined mapping.

These are all the Dynamic Fields you need to add to your search engine schema file:

ElasticSearch Solr
{
  "dynamic_templates": [
    {
      "attr_str": {
        "path_match": "features.*",
        "match": "*_s",
        "mapping": {
          "type": "keyword",
          "norms": true
        }
      }
    }
  ]
}
<dynamicField name="*_twg_s" type="string" indexed="true" stored="true" multiValued="false" omitNorms="false"/>

/** Note that we define the field type as a string whose default class is solr.StrField, which is not analyzed. **/
ElasticSearch Solr
{
  "properties": { ... },
  "dynamic_templates": [
    {
      "twiggle_str": {
        "path_match": "features.*",
        "match": "*_s",
        "mapping": {
          "type": "keyword",
          "norms": true
        }
      }
    },
    {
      "twiggle_str_array": {
        "path_match": "features.*",
        "match": "*_sm",
        "mapping": {
          "type": "keyword",
          "norms": true
        }
      }
    },
    {
      "twiggle_num": {
        "path_match": "features.*",
        "match": "*_n",
        "mapping": {
          "type": "double"
        }
      }
    },
    {
      "twiggle_num_array": {
        "path_match": "features.*",
        "match": "*_nm",
        "mapping": {
          "type": "double"
        }
      }
    },
    {
      "twiggle_bool": {
        "path_match": "features.*",
        "match": "*_b",
        "mapping": {
          "type": "boolean"
        }
      }
    },
    {
      "twiggle_object": {
        "path_match": "features.*",
        "match": "*_o",
        "mapping": {
          "type": "object"
        }
      }
    },
    {
      "twiggle_object_array": {
        "path_match": "features.*",
        "match": "*_om",
        "mapping": {
          "type": "object"
        }
      }
    },
    {
      "twiggle_txt": {
        "path_match": "features.*",
        "match": "*_txt",
        "mapping": {
          "type": "text",
          "analyzer": "standard",
          "norms": true
        }
      }
    }
  ]
}
<dynamicField name="*_twg_s" type="string" indexed="true" stored="true" multiValued="false" omitNorms="false"/>
<dynamicField name="*_twg_sm" type="string" indexed="true" stored="true" multiValued="true" omitNorms="false"/>
<dynamicField name="*_twg_b" type="boolean" indexed="true" stored="true" multiValued="false" omitNorms="false"/>
<dynamicField name="*_twg_bm" type="boolean" indexed="true" stored="true" multiValued="true" omitNorms="false"/>
<dynamicField name="*_twg_n" type="tdouble" indexed="true" stored="true" multiValued="false" omitNorms="false"/>
<dynamicField name="*_twg_nm" type="tdouble" indexed="true" stored="true" multiValued="true" omitNorms="false"/>
<dynamicField name="*_twg_t" type="text_general" indexed="true" stored="true" multiValued="false" omitNorms="false"/>
<dynamicField name="*$concept_twg_sm" type="string" indexed="true" stored="true" multiValued="true" omitNorms="true"/>
<dynamicField name="*features_structure" type="string" indexed="true" stored="true" multiValued="true" omitNorms="false"/>

Embedding Feature Fields in Search Documents

Once your search engine schema is updated, we need to hook into the indexing process and incorporate Feature Fields into listing objects before they are indexed. We'll do so by joining each listing's Features objects with its preexisting fields.

For example, consider this simple product listing object:

{
 "id": "301092356",
 "title": "Ladies Black 100% Leather Knee High Boots",
 "description": "Knee high black leather boots with a mid heel and back zip fastening."
}

When this listing is sent (in the form of a Listing object) in a POST /listings call, the response will include a corresponding Features object:

ElasticSearch Solr
{
  "_id": "40d01198ea3f11e788a842010a81060b",
  "listings": [
    {
      "listing_id": "301092356",
      "features": {
        "$concept_sm": [ "Boot", "Shoe", "FashionProduct" ],
        "$persona_sm": [ "Woman", "Adult" ],
        "heel_o.heel_size_sm": [ "Medium" ],
        "fastener_om.fastener_position_sm": [ "Back" ],
        "$material_om.$concept_sm": [ "Leather" ],
        "$color_sm": [ "Black" ],
        "shoe_shaft_o.shaft_height_sm": [ "Knee High" ],
        "fastener_om.$concept_sm": [ "Zipper", "ApparelFastener" ]
      }
    }
  ]
}
{
  "_id": "40d01198ea3f11e788a842010a81060b",
  "listings": [
    {
      "listing_id": "301092356",
      "features": {
        "features.$concept_twg_sm": [ "Boot", "Shoe", "FashionProduct" ],          
        "features.$persona_twg_sm": [ "Woman", "Adult" ],
        "features.heel_o.heel_size_twg_sm": [ "Medium" ],
        "features.fastener_om.fastener_position_twg_sm": [ "Back" ],
        "features.$material_om.$concept_twg_sm": [ "Leather" ],
        "features.$color_twg_sm": [ "Black" ],
        "features.shoe_shaft_o.shaft_height_twg_sm": [ "Knee High" ],
        "features.fastener_om.$concept_twg_sm": [ "Zipper", "ApparelFastener" ]
      }
    }   
  ]
}

Note that the value of the id field in the search engine document is identical to that of the listing_id field in the response.

After embedding all Feature Fields from the response's Features object into the original listing document, we get a single object that is ready to be indexed in Solr:

{
 "id": "301092356",
 "title": "Ladies Black 100% Leather Knee High Boots",
 "description": "Knee high black leather boots with a mid heel and back zip fastening.",
 "features.$persona_twg_sm": [ "Woman", "Adult" ],
 "features.heel_o.heel_size_twg_sm": [ "Medium" ],
 "features.$concept_twg_sm": [ "Boot", "Shoe", "FashionProduct" ],
 "features.fastener_om.fastener_position_twg_sm": [ "Back" ],
 "features.$material_om.$concept_twg_sm": [ "Leather" ],
 "features.$color_twg_sm": [ "Black" ],
 "features.shoe_shaft_o.shaft_height_twg_sm": [ "Knee High" ],
 "features.fastener_om.$concept_twg_sm": [ "Zipper", "ApparelFastener" ]
}

Once completing this routine for the entire catalog as part of a daily/weekly process, you can proceed to integrate the Twiggle API in your querying process.

Listing Object Structure

Field NameData TypeDescription

listing_id

String

The listing's unique identifier.

parent_id

String

The listing's parent ID, in case the listing is one of several product variants.

title

String

The listing's title.

highlights

Array[String]

The listing's top selling points.

description

String

The listing's full textual description.

images

Array[`Image`]

The listing's set of images. See specification below.

category

Category

The listing's taxonomic category. See specification below.

brand

String

The name of the brand associated with the product.

attributes

Array[`Attribute`]

A list of attributes of this listing. This should include all attributes that aren't included in other fields. See specification below.

price

Price

The listing price. See specification below.

sub_listings

Array[`Listing`]

A list of nested Listing objects. This is meant for bundles comprised of independent products.

Image Object Structure

Field NameData TypeDescription

uri

String

The image's URI. Must be openly accessible. High-resolution images are preferred.

metadata

Object

An open object for image metadata, e.g. point of view, product orientation, featured component, date taken, license, etc.

Category Object Structure

Field NameData TypeDescription

id

String

The category ID in your taxonomy.

name

String

The category's user-facing name.

path

Array[Object]

The category's path in your taxonomy, structured as an array of objects ordered from the top-most category in the path down to the leaf category.

Each object should contain these two fields:

  • id (String, Required): The category's ID in your taxonomy.
  • name (String, Required) : The category's user-facing name.

Attribute Object Structure

Field NameData TypeDescription

attribute

Object

Each object should contain these two fields:

  • id (String, Optional): The attribute's ID in your taxonomy.
  • name (String, Required) : The attribute's user-facing name.

value

String

The attribute value's user-facing name.

Price Object Structure

Field NameData TypeDescription

amount

Number

The numeric value of the product's price.

currency

String

A 3-letter ISO_4217 code for the currency in which the product is priced.