2
\$\begingroup\$

I am trying to implement a CSV data plotting program in C#. In the plotting part, I use ScottPlot.NET to perform plotting operation. There are five column in the given CSV file, therefore, I use Channel1 to Channel5 to present these 5 channels. The following program only plots first channel data. The given CSV file has millions of rows. I am wondering how to improve the performance for the case of drawing millions of points.

The experimental implementation

using ScottPlot.WinForms;
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.IO;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Windows.Forms;

namespace PlottingApp
{
    public partial class Form1 : Form
    {
        private readonly ScottPlot.WinForms.FormsPlot formsPlot1 = new ScottPlot.WinForms.FormsPlot();
        private ScottPlot.Plottables.DataLogger Logger1;

        public Form1()
        {
            InitializeComponent();
        }

        private void button_add_channel_Click(object sender, EventArgs e)
        {
            OpenFileDialog openFileDialog = new OpenFileDialog();

            // Set the title and filter for the dialog
            openFileDialog.Title = "Open File";
            openFileDialog.Filter = "CSV files (*.csv)|*.csv|Text files (*.txt)|*.txt|All files (*.*)|*.*";

            // Display the dialog and check if the user clicked OK
            if (openFileDialog.ShowDialog() == DialogResult.OK)
            {
                // Get the selected file name
                string fileName = openFileDialog.FileName;
                var contents = File.ReadAllText(fileName);
                string[] lines = contents.Split(
                    new string[] { Environment.NewLine },
                    StringSplitOptions.None
                );
                foreach (var item in lines)
                {
                    string[] eachData = item.Split(
                        new string[] { "," },
                        StringSplitOptions.None
                    );
                    Channels channels = new Channels();
                    if (!double.TryParse(eachData[0], out channels.Channel1))
                    {
                        continue;
                    }
                    if (!double.TryParse(eachData[1], out channels.Channel2))
                    {
                        continue;
                    }
                    if (!double.TryParse(eachData[2], out channels.Channel3))
                    {
                        continue;
                    }
                    if (!double.TryParse(eachData[3], out channels.Channel4))
                    {
                        continue;
                    }
                    if (!double.TryParse(eachData[4], out channels.Channel5))
                    {
                        continue;
                    }
                    Logger1.Add(channels.Channel1);
                    formsPlot1.Refresh();
                }

            }
            else
            {
                Console.WriteLine("No file selected.");
            }
        }

        private void Form1_Load(object sender, EventArgs e)
        {
            panel1.Controls.Add(formsPlot1);
            formsPlot1.Width = panel1.Width;
            formsPlot1.Height = panel1.Height;
            // create loggers and add them to the plot
            Logger1 = formsPlot1.Plot.Add.DataLogger();
            Logger1.LineWidth = 3;

            
        }
    }
}
  • Channels class

    public class Channels
    {
        public double Channel1;
        public double Channel2;
        public double Channel3;
        public double Channel4;
        public double Channel5;
    
        public override string ToString()
        {
            return Channel1.ToString() + '\t' +
                   Channel2.ToString() + '\t' +
                   Channel3.ToString() + '\t' +
                   Channel4.ToString() + '\t' +
                   Channel5.ToString();
        }
    }
    
\$\endgroup\$

2 Answers 2

4
\$\begingroup\$

I am wondering how to improve the performance for the case of drawing millions of points.

Example how to read and write a CSV file

Be sure to read C# documentation on FileStream. It has ways of dealing with large files.


ToString Override

Excellent. Thus the object's "instance information" naturally fits into the app's OO design and the .NET framework. For example console implicitly calls ToString

console.Writeline(myChannelObject);

And if you create a custom collection class, its ToString can look like this:

StringBuilder me = new StringBuilder();

foreach(var record in this.CSVrecords)
  me.AppendLine(record); 

return me.ToString();

Encapsulate Object Instantiation

Constructor parameters tells you what is required for a new object. Otherwise you have to read the source code and hope you didn't miss anything or any other mistake that otherwise is easily caught in the constructor.

OpenFileDialog openFileDialog = new OpenFileDialog(
                          "Open File", 
                          "CSV files (*.csv)|*.csv|Text files (*.txt)|*.txt|All files (*.*)|*.*"
                         );

CVS Collection Class

Drastically simplify and encapsulate collection manipulation. For example, take advantage of built-in List<T> functionality to add, sort, filter records

MSDN says to not inherit List<T>, so "have a" List<T>. This hides the extensive List public members & you control the class' API - this is "Domain Specific Language in action.

If client code needs to iterate the collection for itself - or a filter subset even, NET collections have iterators (it's an object). An iterator enables foreach in the client code.

CSV Record

There are five column in the given CSV file

And those columns are named "channel1", etc.?

You will want record-objects to enable/facilitate collection filtering, sorting, testing or enforcing uniqueness, etc.

Because CSV is a well defined format you don't need elaborate manipulation to parse out column values. Every row accounts for all columns even "blank" values.

public class CSVrecord {
  public string channel1 {get; protected set;}
  
  public CSVrecord(string aRecord) {
    string fields = item.Split(
                    new string[] { "," },
                    StringSplitOptions.None
                 );

    channel1 = fields[0];
    // et cetera
  }

  override public string ToString({
    string.Format('"{0}","{1}","{2}","{3}","{4}"', 
               channel1, channel2, channel3, channel4, channel5);
  }
}

CLIENT CODE

that massive if block all but disappears.

public CSVrecords = new CSVrecords();

// reading file stream redacted

forEach (var record in rawRecords)
  CSVrecords.Add(new CSVrecord(record));

console.WriteLine(CSVrecords);
\$\endgroup\$
3
  • \$\begingroup\$ Thank you for answering. How about the plotting part? The critical performance issue is at drawing progress. \$\endgroup\$
    – JimmyHu
    Commented Jun 7, 2024 at 14:00
  • \$\begingroup\$ Need more info. I don't understand the channel class & plot-types. What kind of graphs? Each channel a separate graph? A million data points for ONE graph? Do you need 10^6 points for curve fitting? Is the plot going to cover a wall? Seriously, For paper, 11.5x8.5 or A4 very few points are needed. And generally this is true anyway. A visual graph is not a precision instrument. It's a data summary, as a practical matter \$\endgroup\$
    – radarbob
    Commented Jun 8, 2024 at 4:57
  • \$\begingroup\$ Thank you for answering. In the point of "Encapsulate Object Instantiation", it seems there is no OpenFileDialog constructor which takes that two parameters. \$\endgroup\$
    – JimmyHu
    Commented Jun 11, 2024 at 11:18
2
\$\begingroup\$

I suggest using CsvHelper instead of doing it by yourself. This would save time and efforts, as CsvHelper is memory optimized and covers most cases that you might encounter with CSV serialization and deserialization. It's also support mapping to models out of the box.

The second part is that, I see you're refreshing the form on each iteration, which means, you're trying to show the changes in real-time. If so, you need to convert your work into an asynchronous operations, this would avoid user-interface hangs (it has other advantages as well).

\$\endgroup\$
2
  • \$\begingroup\$ Sounds like a good idea. "Mapping to models" sounds interesting but I'm not certain the model (channel class) is optimal for feeding the plotting function. \$\endgroup\$
    – radarbob
    Commented Jun 8, 2024 at 5:03
  • \$\begingroup\$ Mapping models is just one feature out of many. You can configure your mapping with either Attributes or ClassMap. There are many configurations and features that will make working with CSV much easier to maintain. \$\endgroup\$
    – iSR5
    Commented Jun 8, 2024 at 12:34

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.