Is there an existing issue for this?
Current Behavior
When fetching an extremely large dataset. CQ does not appear to write the DB until the entire resolver has completed.
This is fine when you fetch all the data at once...like in your airtable example:
const resolver: TableResolver = async (clientMeta, parent, stream) => {
const airtableClient = new Airtable({ apiKey, endpointUrl }).base(baseId);
const records = await airtableClient(table.name).select().all();
for (const record of records) {
const recordAsObject = Object.fromEntries(table.fields.map((field) => [field.name, record.get(field.name)]));
stream.write(recordAsObject);
}
return;
};
However...when you have an extremely large dataset...in order to prevent massive memory (and crashes) we need to stream the data as it comes in.
for example
const response = await apiClient.get(`/myapi`);
const rows = response.data.result;
if (rows.length > 0) {
rows.forEach((row: TableData) => stream.write(row));
}
// Pagination logic looping continues
However, despite this...we do not see the resources synced until after the tableResolver is returned.
Expected Behavior
I expect the stream.write to process the object for batch immediate and not wait for the table resovler.
CloudQuery (redacted) config
custom plugin
Steps To Reproduce
No response
CloudQuery (redacted) logs
no relevant logs
CloudQuery version
cloudquery version 6.19.1
Additional Context
No response
Pull request (optional)
Is there an existing issue for this?
Current Behavior
When fetching an extremely large dataset. CQ does not appear to write the DB until the entire resolver has completed.
This is fine when you fetch all the data at once...like in your airtable example:
However...when you have an extremely large dataset...in order to prevent massive memory (and crashes) we need to stream the data as it comes in.
for example
However, despite this...we do not see the resources synced until after the tableResolver is returned.
Expected Behavior
I expect the stream.write to process the object for batch immediate and not wait for the table resovler.
CloudQuery (redacted) config
custom plugin
Steps To Reproduce
No response
CloudQuery (redacted) logs
no relevant logs
CloudQuery version
cloudquery version 6.19.1
Additional Context
No response
Pull request (optional)