Step 2: Data Transformation

Next, the data needs to be transformed into a CSV(comma-separated values) format that can be uploaded into your S3 bucket in the correct format.

Your CSVs should follow the schema of each dataset. For more information, please see the datasets and schemes section.

Below is an example of an ecommerce interactions schema and how the accompanying CSV will need to look.

Scheme used

{
	"name": "user_item_interaction",
	"namespace": "com.xineoh.recommender.schema",
	"fields": [{
		"name": "user_id",
		"type": "string"
	}, {
		"name": "item_id",
		"type": "integer"
	}, {
		"name": "event_type",
		"type": "string",
		"measurement": "ordinal"
	}, {
		"name": "event_value",
		"type": "float"
	}, {
		"name": "event_meta_data_1",
		"label": "selling_price",
		"type": "float",
		"recommender_field": 0
	}, {
		"name": "event_meta_data_2",
		"label": "cost",
		"type": "float",
		"recommender_field": 0
	}, {
		"name": "event_meta_data_3",
		"label": "invoice_id",
		"type": "float",
		"recommender_field": 0
	}, {
		"name": "event_date",
		"type": "timestamp"
	}],
	"version": "1.0"
}

CSV


"user_id","item_id","event_type","event_value","event_meta_data_1","event_meta_data_2","event_meta_data_3","event_date"
"4757","10053","quantity","15","113.466667","49.566667","3112572","2012-09-03 00:00:00"
"4757","301656","quantity","6","111.670000","52.385000","3112572","2012-09-03 00:00:00"
"4738","211","quantity","2","91.090000","37.945000","3112573","2012-09-03 00:00:00"
"4738","270","quantity","2","77.860000","29.760000","3112573","2012-09-03 00:00:00"
"4738","10050","quantity","3","111.180000","47.785000","3112573","2012-09-03 00:00:00"
"4424","382","quantity","4","107.780000","95.800000","3112574","2012-09-03 00:00:00"
"12091","1","quantity","19","110.690000","62.652500","3112575","2012-09-03 00:00:00"
...