Skip to content

bug: JS SDK - not properly supporting streaming #20910

@duncanmapes

Description

@duncanmapes

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

When fetching an extremely large dataset. CQ does not appear to write the DB until the entire resolver has completed.

This is fine when you fetch all the data at once...like in your airtable example:

const resolver: TableResolver = async (clientMeta, parent, stream) => {
    const airtableClient = new Airtable({ apiKey, endpointUrl }).base(baseId);
    const records = await airtableClient(table.name).select().all();
    for (const record of records) {
      const recordAsObject = Object.fromEntries(table.fields.map((field) => [field.name, record.get(field.name)]));
      stream.write(recordAsObject);
    }
    return;
  };

However...when you have an extremely large dataset...in order to prevent massive memory (and crashes) we need to stream the data as it comes in.

for example

      const response = await apiClient.get(`/myapi`);

      const rows = response.data.result;

      if (rows.length > 0) {

        rows.forEach((row: TableData) => stream.write(row));

     }

    // Pagination logic looping continues

However, despite this...we do not see the resources synced until after the tableResolver is returned.

Expected Behavior

I expect the stream.write to process the object for batch immediate and not wait for the table resovler.

CloudQuery (redacted) config

custom plugin

Steps To Reproduce

No response

CloudQuery (redacted) logs

no relevant logs

CloudQuery version

cloudquery version 6.19.1

Additional Context

No response

Pull request (optional)

  • I can submit a pull request

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions