A custom dataset/schema is an expanded variant of the default schemas. The default schemas are the baseline/required data. Here we will be appending additional information. We do this for a few reasons.
For session based interactions you can include an additional two fields: context and geolocation.
Field Name | Type |
---|---|
session_device(string) | “tablet” |
session_geolocation(string) | “North America” |
When retrieving results from the XCB system, it is helpful to have descriptive data within the results. Information such as item name, user’s age, etc. (Under Xineoh’s Terms of Service, you should take care not to upload metadata that could violate privacy laws in any country you operate in.)
We can also include categorical information such as genres, item types, etc.
When training the model, you can include additional information that can improve recommendation results. Metadata such as user’s age, sale location, item cost, and so forth can yield more accurate recommendations.
In the following example, we will be taking the default schema and appending three additional metadata fields. These metadata fields are being added because they will significantly help with reporting. You can set the recommender_field in the schema as true to include the field in the recommendation modelling and this will help improve the recommendation results.
Event types are used to categorise the event values and these are sent to the recommender as strings. E.g. “watch” or “click”.
It is essential to select what fields to add as additional metadata carefully since there can only be a maximum of 10 additional fields for a given schema.
Also, please note that when including additional fields it is required to include a “label” in the schema.
Dataset Type | Required Fields | Examples |
---|---|---|
Interactions |
user_id(string) | 4757 |
item_id(integer) | 10053 | |
event_date(timestamp) | 2022-01-25 00:00:00 | |
event_type(string) | “quantity” | “duration” | |
event_value(float, null) | 14 | 115.42 | |
Interactions (Session Based) |
user_id(string) | 10002541 |
item_id(integer) | 4542155 | |
event_date(timestamp) | 2022-01-25 00:00:00 | |
event_type(string) | “quantity” | “duration” | |
event_value(float, null) | 14 | 115.42 | |
session_id(string) | “a79dc177-91dc-435a-abbf” | |
session_context(string) | “family” |
Field Name | Type | Label | Recommender Field |
---|---|---|---|
meta_data_1 | string | branch | True |
meta_data_2 | string | name | False |
{
"name": "user_item_interaction",
"namespace": "com.xineoh.recommender.schema",
"fields": [
{
"name": "user_id",
"type": "string"
},
{
"name": "item_id",
"type": "string"
},
{
"name": "event_type",
"type": "string",
"measurement": "ordinal"
},
{
"name": "event_value",
"type": "float"
},
{
"name": "event_date",
"type": "timestamp"
},
{
"name" : "event_meta_data_1",
"label": "branch",
"type": "string",
"recommender_field": 1
},
{
"name" : "event_meta_data_1",
"label": "name",
"type": "string",
"recommender_field": 0
}
],
"version": "1.0"
}
We will also include the two optional session fields (context and geolocation) as mentioned above.
{
"name": "user_item_interaction",
"namespace": "com.xineoh.recommender.schema",
"fields": [
{
"name": "user_id",
"type": "string"
},
{
"name": "item_id",
"type": "string"
},
{
"name": "event_type",
"type": "string",
"measurement": "ordinal"
},
{
"name": "event_value",
"type": "float"
},
{
"name": "event_date",
"type": "timestamp"
},
{
"name": "session_id",
"type": "string"
},
{
"name": "session_context",
"type": "string"
},
{
"name": "session_device",
"type": "string"
},
{
"name": "session_geolocation",
"type": "string"
},
{
"name" : "event_meta_data_1",
"label": "branch",
"type": "string",
"recommender_field": 1
},
{
"name" : "event_meta_data_1",
"label": "name",
"type": "string",
"recommender_field": 0
}
],
"version": "1.0"
}
Field Name | Type | Label | Recommender Field |
---|---|---|---|
event_meta_data_1 | integer | watch_duration | True |
{
"name": "user_item_interaction",
"namespace": "com.xineoh.recommender.schema",
"fields": [
{
"name": "user_id",
"type": "string"
},
{
"name": "item_id",
"type": "string"
},
{
"name": "event_type",
"type": "string",
"measurement": "ordinal"
},
{
"name": "event_value",
"type": "float"
},
{
"name": "event_date",
"type": "timestamp"
},
{
"name" : "event_meta_data_1",
"label": "watch_duration",
"type": "integer",
"recommender_field": 1
}
],
"version": "1.0"
}
We will also include the two optional session fields (context and geolocation) as mentioned above.
{
"name": "user_item_interaction",
"namespace": "com.xineoh.recommender.schema",
"fields": [
{
"name": "user_id",
"type": "string"
},
{
"name": "item_id",
"type": "string"
},
{
"name": "event_type",
"type": "string",
"measurement": "ordinal"
},
{
"name": "event_value",
"type": "float"
},
{
"name": "event_date",
"type": "timestamp"
},
{
"name": "session_id",
"type": "string"
},
{
"name": "session_context",
"type": "string"
},
{
"name": "session_device",
"type": "string"
},
{
"name": "session_geolocation",
"type": "string"
},
{
"name" : "event_meta_data_1",
"label": "watch_duration",
"type": "integer",
"recommender_field": 1
}
],
"version": "1.0"
}
The following example is a default schema with two additional metadata fields appended. First, the “name” field is appended and this will be a descriptive field. The second field will be “geolocation”, which will be telling the recommender from where the user shops. This will help improve the recommendations.
It is essential to select what fields to add as additional metadata carefully since there can only be a maximum of 10 additional fields for a given schema.
Also, please note that when including additional fields it is required to include a “label” in the schema.
When using the categorical flag, we can include categorical information such as genres or nested types. The type should be string and pipe separated.
Example
Metadata Used: {suburb}|{county}|{city}|{country}
Format Used: {suburb}|{county}|{city}|{country}
Categorical pipe separated string: Beverly Hills|Los Angeles|California|USA
If you don’t have a lot of metadata to capture, you could use separate fields in the schema. E.g. A field for the suburb, a field for the country, and so forth.
Field Name | Type |
---|---|
user_id | integer |
created_dt | timestamp |
Field Name | Type | Label | Recommender Field |
---|---|---|---|
meta_data_1 | string | geolocation | True |
meta_data_2 | string | name | False |
{
"name": "user_meta_data",
"namespace": "com.xineoh.recommender.schema",
"fields": [
{
"name": "user_id",
"label": "user",
"type": "integer"
},
{
"name": "meta_data_1",
"label": "geolocation",
"type": "string",
"categorical": false,
"recommender_field": 1
},
{
"name": "meta_data_2",
"label": "name",
"type": "string",
"categorical": false,
"recommender_field": 0
},
{
"name": "created_dt",
"type": "timestamp"
}
],
"version": "1.0"
}
In the following example, we use the default schema and append five additional metadata fields. These metadata fields are being added as descriptive fields and they will also help with creating reports later. Three of the fields are also flagged to be included in the recommendation modelling, and this will help improve the recommendation results.
It is essential to select what fields to add as additional metadata carefully since there can only be a maximum of 10 additional fields for a given schema.
Also, please note that when including additional fields, it is required to include a “label” in the schema.
When using the categorical flag, we can include categorical information such as genres or nested types. The type should be string and pipe separated.
Example
Data used: Metadata from “Spider-Man: No Way Home.”
Format Used: {director}|{duration}|{release date}
Categorical pipe separated string: Jon Watts|2h28m|2021
If you don’t have a lot of metadata to capture, you could use separate fields in the schema. E.g. A field for the director, a field for the release date, and a field for the duration.
Field Name | Type |
---|---|
item_id | integer |
created_dt | timestamp |
Field Name | Type | Label | Recommender Field | Hierarchy Level |
---|---|---|---|---|
meta_data_1 | string | description | False | null |
meta_data_2 | string | brand | False | 1 |
meta_data_3 | string | category | True | 2 |
meta_data_4 | string | type | True | 3 |
meta_data_5 | string | garment_type | True | 4 |
{
"name": "item_meta_data",
"namespace": "com.xineoh.recommender.schema",
"fields": [
{
"name": "item_id",
"label": "item",
"type": "integer"
},
{
"name": "meta_data_1",
"label": "description",
"type": "string",
"categorical": false,
"recommender_field": 0
},
{
"name": "meta_data_2",
"label": "brand",
"type": "string",
"categorical": false,
"recommender_field": 0,
"hierarchy_level": 1
},
{
"name": "meta_data_3",
"label": "category",
"type": "string",
"categorical": false,
"recommender_field": 1,
"hierarchy_level": 2
},
{
"name": "meta_data_4",
"label": "type",
"type": "string",
"categorical": false,
"recommender_field": 1,
"hierarchy_level": 3
},
{
"name": "meta_data_5",
"label": "garment_type",
"type": "string",
"categorical": false,
"recommender_field": 1,
"hierarchy_level": 4
},
{
"name": "created_dt",
"type": "timestamp"
}
],
"version": "1.0"
}
Field Name | Type | Label | Recommender Field | Hierarchy Level |
---|---|---|---|---|
meta_data_1 | float | duration | false | null |
meta_data_2 | string | title | false | null |
meta_data_3 | string | genre | true | 1 |
{
"name": "item_meta_data",
"namespace": "com.xineoh.recommender.schema",
"fields": [
{
"name": "item_id",
"label": "item",
"type": "integer"
},
{
"name": "meta_data_1",
"label": "duration",
"type": "float",
"categorical": false,
"recommender_field": 0
},
{
"name": "meta_data_2",
"label": "title",
"type": "string",
"categorical": false,
"recommender_field": 0
},
{
"name": "meta_data_3",
"label": "genre",
"type": "string",
"categorical": true,
"recommender_field": 1,
"hierarchy_level": 1
},
{
"name": "created_dt",
"type": "timestamp"
}
],
"version": "1.0"
}