Customising Datasets and Schemas

A custom dataset/schema is an expanded variant of the default schemas. The default schemas are the baseline/required data. Here we will be appending additional information. We do this for a few reasons.

Descriptive Metadata

When retrieving results from the XCB system, it is helpful to have descriptive data within the results. Information such as item name, user’s age, etc. (Under Xineoh’s Terms of Service, you should take care not to upload metadata that could violate privacy laws in any country you operate in.)

We can also include categorical information such as genres, item types, etc.

Recommendation Improvement - Metadata

When training the model, you can include additional information that can improve recommendation results. Metadata such as user’s age, sale location, item cost, and so forth can yield more accurate recommendations.

Interactions Schema Expanded

In the following example, we will be taking the default schema and appending three additional metadata fields. These metadata fields are being added because they will significantly help with reporting. You can set the recommender_field in the schema as true to include the field in the recommendation modelling and this will help improve the recommendation results.

Event types are used to categorise the event values and these are sent to the recommender as strings. E.g. “watch” or “click”.

It is essential to select what fields to add as additional metadata carefully since there can only be a maximum of 10 additional fields for a given schema.

Also, please note that when including additional fields it is required to include a “label” in the schema.

Default/Required Scheme
Field Name Type
user_id integer
created_dt timestamp
Additional Metadata (Ecommerce Example)
Field Name Type Label Recommender Field
meta_data_1 string branch True
meta_data_2 string name False
The Final Schema (Ecommerce Example)
{
	"name": "user_meta_data",
	"namespace": "com.xineoh.recommender.schema",
	"fields": [{
		"name": "user_id",
		"label": "sku",
		"type": "integer"
	}, {
		"name": "meta_data_1",
		"label": "branch",
		"type": "string",
		"categorical": false,
		"recommender_field": 1
	}, {
		"name": "meta_data_2",
		"label": "name",
		"type": "string",
		"categorical": false,
		"recommender_field": 0
	}, {
		"name": "created_dt",
		"type": "timestamp"
	}],
	"version": "1.0"
}
Additional Metadata (Media Example)
Field Name Type Label Recommender Field
event_meta_data_1 string device True
The Final Schema (Media Example)
{
	"name": "user_item_interaction",
	"namespace": "com.xineoh.recommender.schema",
	"fields": [
		{
			"name": "user_id",
			"type": "string"
		},
		{
			"name": "item_id",
			"type": "integer"
		},
		{
			"name": "event_type",
			"type": "string"
		},
		{
			"name": "event_value",
			"type": "float"
		},
		{
			"name" : "event_meta_data_1",
			"label": "device",
			"type": "string",
			"recommender_field": 1
		},
		{
			"name": "event_date",
			"type": "timestamp"
		}
	],
	"version": "1.0"
}

Users Schema Expanded

The following example is a default schema with two additional metadata fields appended. First, the “name” field is appended and this will be a descriptive field. The second field will be “geolocation”, which will be telling the recommender from where the user shops. This will help improve the recommendations.

It is essential to select what fields to add as additional metadata carefully since there can only be a maximum of 10 additional fields for a given schema.

Also, please note that when including additional fields it is required to include a “label” in the schema.

Using Categorical Data

When using the categorical flag, we can include categorical information such as genres or nested types. The type should be string and pipe separated.

Example

Metadata Used: {suburb}|{county}|{city}|{country}

Format Used: {suburb}|{county}|{city}|{country}

Categorical pipe separated string: Beverly Hills|Los Angeles|California|USA

If you don’t have a lot of metadata to capture, you could use separate fields in the schema. E.g. A field for the suburb, a field for the country, and so forth.

Default/Required Schema
Field Name Type
user_id integer
created_dt timestamp
Additional Metadata
Field Name Type Label Recommender Field
meta_data_1 string geolocation True
meta_data_2 string name False
The Final Schema
{
	"name": "user_meta_data",
	"namespace": "com.xineoh.recommender.schema",
	"fields": [
		{
			"name": "user_id",
			"label": "user",
			"type": "integer"
		},
		{
			"name": "meta_data_1",
			"label": "geolocation",
			"type": "string",
			"categorical": false,
			"recommender_field": 1
		},
		{
			"name": "meta_data_2",
			"label": "name",
			"type": "string",
			"categorical": false,
			"recommender_field": 0
		},
		{
			"name": "created_dt",
			"type": "timestamp"
		}
	],
	"version": "1.0"
}

Items Schema Expanded

In the following example, we use the default schema and append five additional metadata fields. These metadata fields are being added as descriptive fields and they will also help with creating reports later. Three of the fields are also flagged to be included in the recommendation modelling, and this will help improve the recommendation results.

It is essential to select what fields to add as additional metadata carefully since there can only be a maximum of 10 additional fields for a given schema.

Also, please note that when including additional fields, it is required to include a “label” in the schema.

Using Categorical Data

When using the categorical flag, we can include categorical information such as genres or nested types. The type should be string and pipe separated.

Example

Data used: Metadata from “Spider-Man: No Way Home.”

Format Used: {director}|{duration}|{release date}

Categorical pipe separated string: Jon Watts|2h28m|2021

If you don’t have a lot of metadata to capture, you could use separate fields in the schema. E.g. A field for the director, a field for the release date, and a field for the duration.

Default/Required Schema
Field Name Type
item_id integer
created_dt timestamp
Default/Required Schema
Field Name Type Label Recommender Field Hierarchy Level
meta_data_1 string description False null
meta_data_2 string brand False 1
meta_data_3 string category True 2
meta_data_4 string type True 3
meta_data_5 string garment_type True 4
The Final Schema (Ecommerce Example)
{
	"name": "item_meta_data",
	"namespace": "com.xineoh.recommender.schema",
	"fields": [
		{
			"name": "item_id",
			"label": "item",
			"type": "integer"
		},
		{
			"name": "meta_data_1",
			"label": "description",
			"type": "string",
			"categorical": false,
			"recommender_field": 0
		},
		{
			"name": "meta_data_2",
			"label": "brand",
			"type": "string",
			"categorical": false,
			"recommender_field": 0,
			"hierarchy_level": 1
		},
		{
			"name": "meta_data_3",
			"label": "category",
			"type": "string",
			"categorical": false,
			"recommender_field": 1,
			"hierarchy_level": 2
		},
		{
			"name": "meta_data_4",
			"label": "type",
			"type": "string",
			"categorical": false,
			"recommender_field": 1,
			"hierarchy_level": 3
		},
		{
			"name": "meta_data_5",
			"label": "garment_type",
			"type": "string",
			"categorical": false,
			"recommender_field": 1,
			"hierarchy_level": 4
		},
		{
			"name": "created_dt",
			"type": "timestamp"
		}
	],
	"version": "1.0"
}
Additional Metadata (Media Example)
Field Name Type Label Recommender Field Hierarchy Level
meta_data_1 float duration false null
meta_data_2 string title false null
meta_data_3 string genre true 1
The Final Schema (Media Example)
{
	"name": "item_meta_data",
	"namespace": "com.xineoh.recommender.schema",
	"fields": [
		{
			"name": "item_id",
			"label": "item",
			"type": "integer"
		},
		{
			"name": "meta_data_1",
			"label": "duration",
			"type": "float",
			"categorical": false,
			"recommender_field": 0
		},
		{
			"name": "meta_data_2",
			"label": "title",
			"type": "string",
			"categorical": false,
			"recommender_field": 0
		},
{
			"name": "meta_data_3",
			"label": "genre",
			"type": "string",
			"categorical": true,
			"recommender_field": 1,
			"hierarchy_level": 1
		},

		{
			"name": "created_dt",
			"type": "timestamp"
		}
	],
	"version": "1.0"
}