Skip to content

Optimize Postgres dataWriter.go by Implementing Batch Insert and Update Operations #1252

@tolgaozen

Description

@tolgaozen

Problem Description

Currently, in the internal/storage/postgres/dataWriter.go file, the INSERT and UPDATE operations are performed individually for each tuple in the RelationTuplesTable. This method can be inefficient, particularly when dealing with large volumes of data, as it increases network latency and places a higher load on the database server due to multiple query executions.

Proposed Solution

Implement batch processing for INSERT and UPDATE operations using pgx batch capabilities. This will reduce the number of round-trips to the database and improve the overall performance of the data writing process.

Implementation Details

  1. Batch Inserts:

    • Utilize pgx.Batch to queue multiple INSERT statements and execute them in a single batch. This is particularly useful when inserting multiple tuples into RelationTuplesTable.
    • Example:
      batch := &pgx.Batch{}
      titer := tb.Write.CreateTupleIterator()
      for titer.HasNext() {
          t := titer.GetNext()
          batch.Queue("INSERT INTO relation_tuples (fields...) VALUES ($values...)",
                      t.GetEntity().GetType(), t.GetEntity().GetId(), ...)
      }
      br := w.database.SendBatch(ctx, batch)
      defer br.Close()
      _, err = br.Exec()
      if err != nil {
          return err
      }
  2. Batch Updates:

    • Similarly, accumulate multiple UPDATE conditions into a single pgx.Batch and execute them together to minimize database interactions.
    • Example:
      batch = &pgx.Batch{}
      for _, condition := range deleteClauses {
          query, args, _ := squirrel.Update("relation_tuples").Set(...).Where(...).ToSql()
          batch.Queue(query, args...)
      }
      br = w.database.SendBatch(ctx, batch)
      defer br.Close()
      _, err = br.Exec()
      if err != nil {
          return err
      }

Benefits

  • Reduced Network Latency: Minimizes the number of database round-trips required for executing multiple queries.
  • Enhanced Performance: Improves the throughput of database operations, especially under high load.
  • Resource Optimization: Decreases the CPU and memory footprint on both the application and database server.

Considerations

  • Ensure proper transaction management and error handling are in place to handle possible failures in batch processing.
  • Conduct thorough testing to validate the performance improvements and ensure data integrity.

This change is expected to significantly enhance the efficiency of data handling in the system, particularly when processing large datasets or under high transaction scenarios.

Metadata

Metadata

Assignees

Type

No type

Projects

Relationships

None yet

Development

No branches or pull requests

Issue actions