Remove specific columns from csv using awk

Question

I was able to get my columns in csv format as shown below:

,col1,col2,col3,col4,col5,,

I used this awk command to get the ourput in this format:

awk -vORS=, '$0 && p {print $2}; $2 == "name" {p=1} '`

and then I used the following two commands to remove the leading and trailing two commas:

   cols=${cols:1}
   cols=${cols:0:${#cols}-2}

Now I get the output in this format:

col1,col2,col3,col4,col5

I want to remove specific columns from the right that match a list. For example if I call the function with parameter "col4,col5", awk should remove the last two columns and print the output like this:

col1,col2,col3

How can this be done in shell script(preferably with awk or grep or some other such shell supported command)?

Update: Initial file contents are output in a table as shown below:

+-----------------------------------------+--------+---------+
| name                                    | type   | comment |
+-----------------------------------------+--------+---------+
| col1                                    | int    |         |
| col2                                    | int    |         |
| col3                                    | string |         |
| col4                                    | string |         |
| col5                                    | string |         |
+-----------------------------------------+--------+---------+

If they're in a database, just run a query that requests the fields you want, and use awk -F\| to transform from pipe-separated into CSV. — DopeGhoti, Commented Jan 16, 2018 at 16:10
I want a generic script that extracts only the column names without specifying the column names, something like 'select *', but that enforces the order by explicitly generating column names. The above table output is from an impala command — user2441441, Commented Jan 16, 2018 at 16:19
xargs and tr are good for avoiding the dangling comma, as an example. grep col /tmp/foo.txt | cut -d"|" -f2 | xargs | tr " " "," — gdahlm, Commented Jan 16, 2018 at 18:24

Matt Quinn · Accepted Answer · 2018-01-16 18:10:54Z

5

You can use 'cut' to extract certain columns from delimited data. For example, the below to extract the last two columns:

echo col1,col2,col3,col4,col5 | cut -d , -f 4,5

prints

col4,col5

The -d argument specifies the delimiter and the -f specifies the index or range of indexes for the resulting fields that you want in your results

EDIT

To make it a little more dynamic, the below will select the last X columns based on a delimiter of Y:

function lastCols {
        endcol=$(($(head -n 1 $1 | grep -o , | wc -l) + 1))
        startcol=$(($endcol-$2+1))
        cut -d $3 -f $startcol-$endcol < $1
}

lastCols $1 $2 $3

I've not done much testing on this, so it's likely a little buggy. Use as below:

[]$ cat temp.txt
col1,col2,col3,col4,col5
col1,col2,col3,col4,col5
col1,col2,col3,col4,col5
col1,col2,col3,col4,col5
col1,col2,col3,col4,col5
col1,col2,col3,col4,col5
col1,col2,col3,col4,col5
col1,col2,col3,col4,col5
col1,col2,col3,col4,col5

[]$ ./lastCols.sh temp.txt 2 ,
col4,col5
col4,col5
col4,col5
col4,col5
col4,col5
col4,col5
col4,col5
col4,col5
col4,col5

edited Jan 16, 2018 at 18:10

answered Jan 16, 2018 at 16:14

Matt Quinn

513 bronze badges

1

This will remove the 4th and 5th columns, but I will explicitly have to mention the numbers 4 & 5. I'm not sure that the last two columns will always be 4 and 5 for every table.
– user2441441
Commented Jan 16, 2018 at 16:20
1

Edited to make column selection dynamic
– Matt Quinn
Commented Jan 16, 2018 at 18:12
1

Actually I need to select all the columns except those last columns! :p
– user2441441
Commented Jan 16, 2018 at 19:18
1

Haha I’m sure you can juggle it around! :-)
– Matt Quinn
Commented Jan 16, 2018 at 20:51

Add a comment |

cas · Accepted Answer · 2018-01-16 16:30:10Z

Here's something I wrote several years ago to solve exactly this problem when I was working on openstack a lot and got annoyed by the difficut-to-parse output from openstack tools:

#! /usr/bin/perl

# untable.pl
#
# Copyright (C) 2012, 2013 Craig Sanders <[email protected]>
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2, or (at your option)
# any later version.

# script to strip mysql-style table formatting from nova, keystone,
# glance, etc commands
#
# also works for any tables output from mysql, and from tables produced
# by 'links -dump'
#
# makes the output easily parsable and usable in other scripts.
#
# TODO: command-line option to allow forcing of output style (2-column
# or multi-column) rather than detection.


use strict;

use Carp;
use Getopt::Long;

my $print_headers=0;
my $separator = '';
my $tab = '';
my $result = GetOptions("headers!"  => \$print_headers,
                        "separator=s" => \$separator,
                        "tab" => \$tab,
                       );
$separator = "\t" if ($tab);

my $propval = -1;
our @headers;

while(<>) {
  chomp;
  next if (m/^\+/);

  s/^\|\s*|\s*\|$//iog;  # this / is here to fix SE''s broken perl syntax highlighting.

  my @columns = split '\|';
  # strip leading and trailing spaces
  for my $col (0..scalar @columns-1) {
    if ($columns[$col] eq '') {;
      delete $columns[$col];
    } else {
      $columns[$col] =~ s/^\s+|\s+$//iog;
    };
  }

  # find type of table - 2-column Property/Value, or multi-column
  if ($propval == -1) {
    if ($columns[0] eq 'Property') {
      $propval = 1 ;
      $separator = ": " if ($separator eq '');  # default to ': ' unless specified on cmd line
    } else {
      $propval = 0;
      $separator = "\t" if ($separator eq '');  # default to TAB unless specified on cmd line
      @headers = @columns;
      print (join($separator,@headers),"\n") if $print_headers ;
    };
    next;
  } else {
    print join($separator,@columns),"\n" if (defined $columns[1]);    # skip line unless we have more than one column to output
  }
}

Examples:

Two-Column:

$ keystone tenant-get 93c14424ed06494c832457d974b9505e
+-------------+-----------------------------------------+
|   Property  |                  Value                  |
+-------------+-----------------------------------------+
| description | Anonymous Tenant Description            |
| enabled     | True                                    |
| id          | 93c14424ed06494c832457d974b9505e        |
| name        | ANON1                                   |
+-------------+-----------------------------------------+

$ keystone tenant-get 93c14424ed06494c832457d974b9505e | ./untable.pl
description: Anonymous Tenant Description
enabled: True
id: 93c14424ed06494c832457d974b9505e
name: ANON1

Multi-Column:

$ keystone user-list 810
+-----+---------+-----------------------------+-----------------------------+
|  id | enabled |            email            |             name            |
+-----+---------+-----------------------------+-----------------------------+
| 414 | 1       | [email protected]    | [email protected]    |
| 500 | 1       | [email protected]    | [email protected]    |
| 610 | 1       | [email protected]    | [email protected]    |
| 729 | 1       | [email protected]    | [email protected]    |
+-----+---------+-----------------------------+-----------------------------+

$ keystone user-list 810 | ./untable.pl
414     1       [email protected]    [email protected]
500     1       [email protected]    [email protected]
610     1       [email protected]    [email protected]
729     1       [email protected]    [email protected]

Correct me if I'm wrong, but this will only remove the outer table, which I already extracted with my first awk command. — user2441441, Commented Jan 16, 2018 at 17:05
it's a tool for getting data out of a mysql-like table into a format that's easily parsed and used by, e.g., awk or sed or perl or other text processing tools. you feed it a table, and it will give you easily-parsed output that you can extract what you need from - without having to rewrite the table-stripping stuff every time. e.g. you're only interested in the column names, so you could pipe the output of this into awk and have the awk script join $1 from all lines. BTW, search on this site for "awk join function" -I've posted on that topic several times, and probably others have too. — cas, Commented Jan 16, 2018 at 17:14
e.g. i posted a join() function for awk in unix.stackexchange.com/a/415145/7696 on Jan 6 this year. I guess what I should have stated in my answer was that I think your approach is wrong, instead of doing some of your task in sh and some in awk and some in sed or whatever, you should do ALL of it in awk or perl or some other real text processing language. and, as necessary, write generic tools to do discrete parts of the job and use sh only to set up the initial conditions and join the tools together in a pipeline. — cas, Commented Jan 16, 2018 at 17:19

Stack Exchange Network

Remove specific columns from csv using awk

2 Answers 2

You must log in to answer this question.

Linked

Hot Network Questions

Remove specific columns from csv using awk

2 Answers 2

You must log in to answer this question.

Linked

Related

Hot Network Questions