3
\$\begingroup\$

Been on SO for a very long time, new to this site. I am creating console app which tests API endpoint performance using json files from the specified directory as request bodies. Request json files themselves are not large ~5KB but large number of them. This is what relevant logic pieces look like. Need feedback if it's a viable way of loading all json files from a testing directory and ways to improve it if any.

  1. This is a class to hold file json data:
    public readonly struct AppFileInfo
    {
        public AppFileInfo(string fullPath, long size)
        {
            FullPath = fullPath;
            Name = Path.GetFileNameWithoutExtension(fullPath);
        }
    
        public AppFileInfo(FileInfo fileInfo)
        {
            FullPath = fileInfo.FullName;
            Name = Path.GetFileNameWithoutExtension(fileInfo.Name);
    
            using var buffer = new ArrayPoolBufferWriter<byte>(ArrayPool<byte>.Shared, 8192);
            using var writer = new Utf8JsonWriter(buffer);
    
            using var stream = File.OpenRead(fileInfo.FullName);
            using var document = JsonDocument.Parse(stream);
                
            document.WriteTo(writer);
    
            Contents = buffer.GetMemory();                               
        }
    
        public string Name { get; init; }
    
        public string FullPath { get; init; }
    
        public ReadOnlyMemory<byte> Contents { get; init; }
    
        public ReadOnlySpan<byte> ContentSpan => Contents.Span;
    };
  1. This is how I would use that object to parse specified Directory:
    public class RequestFileReader
    {
        private AppConfiguration _appConfig;
    
        public RequestFileReader(IOptions<AppConfiguration> appConfig)
        {
            _appConfig = appConfig.Value;
        }
        
        public List<AppFileInfo> LoadRequestsFromDir(string dir)
        {
            List<AppFileInfo> requestFiles = [];
            var fileInfos = new DirectoryInfo(dir).EnumerateFiles().Where(f => f.Extension == ".json");                    
            foreach (var fileInfo in fileInfos)
            {
                var appFileInfo = new AppFileInfo(fileInfo);
                requestFiles.Add(appFileInfo);
    
            }
            return requestFiles;
        }
    }
  1. This is how I would use that json data:
    public async Task<List<TestResulta>> Run(string responsesPath)
    {
        var testRequests = _requestFileReader.LoadRequestsFromDir(appConfig.GetMismatchTestPath());

        var requestChunks = 
        testRequests.Chunk(_appConfig.ChunkSize).ToList();
        foreach (var chunk in requestChunks)
        {          
           var tasks = chunk.Select(r => CreateResponseTask(r, 
           responsesPath));
        await foreach (var task in Task.WhenEach(tasks))
        {
           testResults.Add(await task);
        }
        await Task.Delay(_appConfig.ChunkInterval);
  
        return testResults;

     }

     public async Task<MismatchTestComparisonResult> CreateResponseTask(AppFileInfo request, string responsesPath)
     {
        HttpRequestMessage requestMessage = new HttpRequestMessage()
        {
           Method = HttpMethod.Post,
           RequestUri = new Uri(uri),
           Content = new ReadOnlyMemoryContent(request) { Headers = { 
           ContentType = new("application/json") } }
        };

        var client = _httpClientFactory.CreateClient();
        try
        {
           client.DefaultRequestHeaders.Clear();
           client.DefaultRequestHeaders.Accept.Add(new 
           client.MediaTypeWithQualityHeaderValue("application/json"));
           client.DefaultRequestHeaders.Authorization = new 
           AuthenticationHeaderValue("Bearer", token);
           client.DefaultRequestHeaders.Add("Connection", "keep-alive");
           client.Timeout = TimeSpan.FromMinutes(appConfig.ClientTimeout);
       }

       var response = await client.SendAsync(requestMessage);
    }

Any feedback/suggestions especially performance improvements and memory optimizations greatly appreciated.

\$\endgroup\$
1
  • \$\begingroup\$ Seems like you accidentally garbled code indentation in the third block when copy-pasting or removing "some details for brevity". Since there are no answers yet, consider making a quick edit to make the snippet human-readable again. \$\endgroup\$ Commented May 11 at 0:36

1 Answer 1

2
\$\begingroup\$

constructors with differing behavior

These two do quite different things:

        public AppFileInfo(string fullPath, long size)
        ...    
        public AppFileInfo(FileInfo fileInfo)

In the first one, it wouldn't hurt to explicitly assign Contents = ReadOnlyMemory<byte>.Empty, for clarity to an engineer who is calling into this.

But more important, in the second one the filesystem side effects are surprising, and no /** doc comment */ explains what will happen. Better to defer the FS activity until caller specifically asks for it by making a method call. It's unclear why it is "ok" for the first one to omit writing to a file; this is best resolved by making the second one not write to a file. The file writing code probably belongs in another class.

unused param

There's no need for that first constructor to accept size, given that it's never used.

nit: In the classname AppFileInfo, the "App" prefix is less informative than it could be. Prefer to choose a name drawn from the application's Business Domain.

informative identifiers

This is a pretty good name:

        public List<AppFileInfo> LoadRequestsFromDir(string dir)

Consider improving it by making it slightly longer: LoadJsonRequestsFromDir. This would reflect the hardcoded .json suffix it scans for.

nit: In the foreach body, consider leaving appFileInfo anonymous, so we have just a one-liner of requestFiles.Add(new AppFileInfo(fileInfo)).

memory impact

We eagerly read the entire directory's contents. For a specified dir containing N json files, the return value shall consume an extra \$5000 \times N\$ bytes. It's unclear why you wouldn't want to return N empty objects which will lazily read from the filesystem when the caller actually needs the result data. This can help with caller discarding AppFileInfo objects as it iterates, so its peak memory footprint corresponds to a constant number of files being RAM resident, rather than making all of them simultaneously resident.

(Or maybe ChunkSize and ChunkInterval are relevant for that? Except the executable code, comments, and Review Context are silent on their magnitude and on the design constraints, beyond saying there is a "large number of [files]" in the folder.)

crazy indenting

This is much harder to read and properly interpret than it should be, and it's not the only such instance.

           client.DefaultRequestHeaders.Accept.Add(new 
           client.MediaTypeWithQualityHeaderValue("application/json"));

Indent that second line a bit more, please. Better, rely on auto-formatter tooling.

accurate identifiers

The last line of CreateResponseTask executes
await client.SendAsync(requestMessage). That is a POLA violation. Either the method name should explain that it produces side effects across the network, or the "create" routine should create the task and then let caller asynchronously send it.

Generally, if you notice your constructor is producing side effects, then your constructor is probably "too big". Write a method that lets caller produce side effects when they're ready for side effects. For example, automated unit tests often wish to create an object with the bare minimum of setup and cleanup required, and side effects can work at cross purposes to that.

\$\endgroup\$
4
  • \$\begingroup\$ Thanks for the great feedback, addressing it already! To you point "the file writing code probably belongs in another class" - we never write back to the file. This is a console app to test one of our APIs single endpoint performance. So all the JSON data gets read from files, then used it to build up requests then sending requests en-mass in parallel and analyzing metrics and generate performance report (percentage of successful calls, longest response time, average response time, etc.) and This is just a first part of it where you read files and build up request based on that data. \$\endgroup\$ Commented May 11 at 20:39
  • \$\begingroup\$ Next clarification - "Or maybe ChunkSize and ChunkInterval are relevant for that?". ChuckSize has a default of 1000 and is referring to how many requests sent simultaneously. So if directory contains (realistic number) 5000 JSON files, it all gets read, and a collection of 5000 JSON requests is built first. And then it chunked by a 1000, so only a 1000 requests get sent and analyzed at a time. Can attach sample if it helps. That's why performance/memory optimization is important - JSON files are small - around 3-4K, but number of them is large - around 5000. \$\endgroup\$ Commented May 11 at 20:52
  • \$\begingroup\$ These sound like Magic Numbers, each worthy of being assigned to a MANIFEST_CONSTANT which can then have a comment about the design rationale for the value. // If "limiting client-side memory footprint" is a non-goal, then I imagine a single batch of 5000 requests, rather than five smaller batches, would be fine. Put another way, the OP code doesn't offer a rationale for idling for some interval in between batches. As usual, get out your stopwatch. Measure, and see! \$\endgroup\$ Commented May 11 at 21:05
  • \$\begingroup\$ Stopwatch is in play, one of those "Skipped for brevity" things. That's a requirement literally - "Analyze endpoint performance 1000 requests at a time". Performance report gets generated daily and is forwarded to people - how many requests failed(500 response), average response time, worst response time, etc. \$\endgroup\$ Commented May 11 at 21:17

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.