Skip to content

Write csv not save all lines of dataframe #3783

@miyake-san

Description

@miyake-san

Describe the bug
When I try to save dataframe as csv, only around 400K of lines are saved.. data has more than 1M of lines.

To Reproduce
My code:

use datafusion::prelude::*;
use log::{debug, info, LevelFilter, trace};
use crate::datapipeline::data_utils::*;
pub mod datapipeline;
use datafusion::logical_plan::when;

use datafusion::arrow::datatypes::DataType::{Int64,Utf8};
#[tokio::main]
async fn main() -> datafusion::error::Result<()> {
  let ctx: SessionContext = SessionContext::new();
  let raw_fato_path: &str = "data/minilake/raw/fato_census/Data8277.csv";
  let stage_fato_path: &str = "data/minilake/stage/fato_census/";
  let fato_census_df = ctx.read_csv(raw_fato_path,  
                                  CsvReadOptions::new()).await?;
  
  let fato_census_df = fato_census_df.with_column("area",cast(
    col("Area"),
    Utf8))?;

  let fato_census_df = fato_census_df
    //.with_column("Area",concat_ws("-", &vec![lit("A"),col("Area")]))?
    .select(vec![
      col("Year").alias("year"),
      col("Age").alias("age"),
      col("Ethnic").alias("ethnic"),
      col("Sex").alias("sex"),
      col("Area").alias("area"),
      col("count").alias("total_count")
      ])?;
  
  // We can see the ..C values in Count column
  fato_census_df.show_limit(5).await?;
  print_schema_of_dataframe(&fato_census_df).await?;
  // Create a function to make trnasformation
  let transform_count_data = when(col("total_count")
    .eq(lit("..C")), lit(0_u32))
    .otherwise(col("total_count"))?;

  //Cast column datatype
  let fato_census_df = fato_census_df.with_column(
    "total_count",
    cast(transform_count_data, Int64))?;
  
  fato_census_df.write_csv(stage_fato_path).await?;

  Ok(())
  }

Dataset:

Age and sex by ethnic group (grouped total responses), for census usually resident population counts, 2006, 2013, and 2018 Censuses (RC, TA, SA2, DHB)
Expected behavior
See all lines saved:

image

But only this quantity are saved.
image

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions