1

I got a CSV File with a lot of useless information, and I want the information that i need from that file into another CSV File.

Current State:

First Name,Middle Name, Last Name, Title, Suffix, Nickname, Given Yomi, Surname Yomi....
Angel,,Romero,,,Romi,, ....

In new File Should be something like:

First Name, Last Name, Nickname
Angel, Romero, Romi

I want to do that by using something like cut and the Column names, not just the Field numbers. Like this somehow:

cut -d',' -f"First Name" file

I know that doesn't work but is there another way?

8
  • 1
    As you don't need reordering, cut -d ',' -f '1,3,6' is enough Commented Dec 6, 2021 at 13:26
  • You asked the exact same question. Please edit your original question instead of opening a new one Commented Dec 6, 2021 at 13:43
  • "I want to do that by using something like cut and the Column names, not just the Field numbers." Why? If that's really what you want, try SQL. Should be reasonably easy to import into any small DB. Commented Dec 6, 2021 at 14:12
  • 3
    The csvcut command from csvkit does exactly this: csvcut -Sc 'First Name','Last Name' file.csv Commented Dec 6, 2021 at 15:14
  • @glennjackman I tried that but it always tells me 'csvcut: command not found' even tho i installed csvkit Commented Dec 6, 2021 at 15:31

5 Answers 5

2

The tool is Miller:

mlr --csv cut -o -f "field A","field B" input.csv >output.csv

Here the documentation for cut verb.

Sign up to request clarification or add additional context in comments.

2 Comments

for some reason I can't install the tool..
@MahmoudAbdulkarim what's your operative system? What errors did you have during the installation?
2
$ cat csvcut.awk
# csvcut.awk

function csvsplit(str, arr,     i,j,n,s,fs,qt) {
    # split comma-separated fields into arr; return number of fields in arr
    # fields surrounded by double-quotes may contain commas;
    #     doubled double-quotes represent a single embedded quote
    delete arr; s = "START"; n = 0; fs = ","; qt = "\""
    for (i = 1; i <= length(str); i++) {
        if (s == "START") {
            if (substr(str,i,1) == fs) { arr[++n] = "" }
            else if (substr(str,i,1) == qt) { j = i+1; s = "INQUOTES" }
            else { j = i; s = "INFIELD" } }
        else if (s == "INFIELD") {
            if (substr(str,i,1) == fs) {
                arr[++n] = substr(str,j,i-j); j = 0; s = "START" } }
        else if (s == "INQUOTES") {
            if (substr(str,i,1) == qt) { s = "MAYBEDOUBLE" } }
        else if (s == "MAYBEDOUBLE") {
            if (substr(str,i,1) == fs) {
                arr[++n] = substr(str,j,i-j-1)
                gsub(qt qt, qt, arr[n]); j = 0; s = "START" } } }
    if (s == "INFIELD" || s == "INQUOTES") { arr[++n] = substr(str,j) }
    else if (s == "MAYBEDOUBLE") {
        arr[++n] = substr(str,j,length(str)-j); gsub(qt qt, qt, arr[n]) }
    else if (s == "START") { arr[++n] = "" }
    return n }

BEGIN { # read and store output field names
    for (i=1; i<ARGC; i++) { fields[++nfields] = ARGV[i]; ARGV[i] = "" } }

NR == 1 { # read and store input field names, write output header
    for (i=1; i<=csvsplit($0,arr); i++) { names[arr[i]] = i }
    for (i=1; i<=nfields; i++) { printf "%s%s", sep, fields[i]; sep = "," }
    printf "\n" }

NR > 1 { # read input record, split fields, write output record
    delete csv; sep = ""; n = csvsplit($0, csv)
    for (i=1; i<=nfields; i++) {
        printf "%s%s", sep, csv[names[fields[i]]]; sep = "," }
    printf "\n" }
$ cat mahmoud.input
FirstName,MiddleName,LastName,Title,Suffix,Nickname,GivenYomi,SurnameYomi
Angel,,Romero,,,Romi,,
$ awk -f csvcut.awk FirstName LastName Nickname <mahmoud.input
FirstName,LastName,Nickname
Angel,Romero,Romi

Comments

1
awk -v tags='First Name,Last Name,Nickname' '
    BEGIN {
        FS=", *"; OFS=", "
        numOutFlds = split(tags,outFldNr2tag)
    }
    NR==1 {
        for (inFldNr=1; inFldNr<=NF; inFldNr++) {
            tag = $inFldNr
            tag2inFldNr[tag] = inFldNr
        }
    }
    {
        for (outFldNr=1; outFldNr<=numOutFlds; outFldNr++) {
            tag = outFldNr2tag[outFldNr]
            inFldNr = tag2inFldNr[tag]
            val = $inFldNr
            printf "%s%s", val, (outFldNr<numOutFlds ? OFS : ORS)
        }
    }
' file
First Name, Last Name, Nickname
Angel, Romero, Romi

Comments

1

Given that you have a straight CSV without the variable space, you can use Ruby's csv parser directly (without cleaning the csv file first...)

Given:

cat file
First Name,Middle Name,Last Name,Title,Suffix,Nickname,Given Yomi,Surname Yomi
Angel,,Romero,,,Romi,,

You can just filter each csv row:

ruby -r CSV -e 'BEGIN{wanted=["First Name", "Last Name", "Nickname"]
                      puts wanted.to_csv
                      }     
CSV.parse($<.read, headers:true).each{
    |h| puts h.to_hash.select{
    |k,v| wanted.include?(k) }.values.to_csv}' file

Prints:

First Name,Last Name,Nickname
Angel,Romero,Romi

The advantage here is that full csv files are supported including quoted fields with embedded delimiters.

Comments

1

Maybe late and not very general, but very simple if you don't need to reuse the script:

awk 'BEGIN {FS=", *"; OFS=","}{print $1,$3,$6}' input.csv > output.csv

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.