Builtin saving scripts

Ayakashi comes bundled with a set of scripts to help saving extracted data without writing any extra code.

Table of contents

Limitations

The builtin saving scripts are designed to work out of the box without the need of any extra code.
For that reason they will only work with data in the format returned by using extract with props.
Both a single extracted prop and multiple extracted props wrapped in an object work:

return ayakashi.extract("myProp", "text");
const data1 = ayakashi.extract("myProp1", "text");
const data2 = ayakashi.extract("myProp2", "text");
const data3 = ayakashi.extract("myProp3", "text");

return {data1, data2, data3}

When wrapping multiple extracted props in an object and these props contain multiple matches, the data will be grouped correctly and normalized into proper rows.

printToConsole

Not exactly a saving script, the printToConsole script will just print the data passed to it in a table format.
Can really help while developing.
Use it like any other script by including in your pipeline after the step that outputs the data you want to print:

{
    waterfall: [{
        type: "scraper",
        module: "myScraper"
    }, {
        type: "script",
        module: "printToConsole"
    }]
}

saveToJSON

The saveToJSON will persist the data in a json file in a performant way

{
    waterfall: [{
        type: "scraper",
        module: "myScraper"
    }, {
        type: "script",
        module: "saveToJSON",
        params: {
            file: "myData.json"
        }
    }]
}

You may configure the filename of the generated json file by passing a file param.
By default it will output a data.json file.

saveToCSV

The saveToCSV will persist the data in a csv file in a performant way

{
    waterfall: [{
        type: "scraper",
        module: "myScraper"
    }, {
        type: "script",
        module: "saveToCSV",
        params: {
            file: "myData.csv"
        }
    }]
}

You may configure the filename of the generated csv file by passing a file param.
By default it will output a data.csv file.

saveToSQL

The saveToSQL script allows saving data to all of the major SQL engines in a table format

{
    waterfall: [{
        type: "scraper",
        module: "myScraper"
    }, {
        type: "script",
        module: "saveToSQL",
        params: {}
    }]
}

There are a lot of parameters you can configure:

params: {
    dialect?: "mysql" | "mariadb" | "postgres" | "mssql" | "sqlite",
    host?: string,
    port?: number,
    database?: string,
    username?: string,
    password?: string,
    connectionURI?: string,
    table?: string
}

Dialect

You may use any of the following dialects:

If no dialect is defined, sqlite is used by default.

Connection options

In all cases except of sqlite you probably need to supply some kind of connection options.
You may use the host, port, database, username and password options or construct a connectionURI and use that instead.
If using sqlite only the database option has effect and it will change the database filename. By default a database.sqlite filename is used.

Table name

You can change the table name in any dialect by specifying the table option.
If no table option is provided an internal uuid will be used.

dataTypes

Supported dataTypes are the following:

  • booleans
  • integers
  • floats
  • objects
  • strings

Any other dataType will be saved as a string.

Objects will be saved in a JSON column if the dialect is set to postgres or sqlite.
For all the others a TEXT field will be used.