1

I have a dataset (originally large CSV) that I filtered using duckdb and dbplyr.

This is a small script that get to my idea :

library(duckdb) 
library(DBI)
library(dplyr)
library(dbplyr)

path_data_csv = "data.csv"
data = duckdb::tbl_file(DBI::dbConnect(duckdb()), path_data_csv)
new_data = data |> 
  filter(datasetKey == '12345678', 
         kingdom %in% c('Animalia')) |> 
  collect()

Then since the data is in memory, I can export it using write.csv() :

write.csv(x = new_data, 
          file = 'newdata.csv', 
          row.names = FALSE)

But I was wondering if there is a way to export it directly into CSV without 'collecting' the data first. Essentially, I want to export the data of class c("tbl_duckdb_connection", "tbl_dbi", "tbl_sql", "tbl_lazy", "tbl") to CSV without materialization. I wish to find something that is not requiring to use dbExecute or dbWriteTable (which would be ways to do it).

3
  • 4
    duckplyr.tidyverse.org/reference/compute_csv.html? Commented Sep 14 at 22:40
  • If you are simply reading a csv, performing a simple filter and writing back out I wonder if you really need R at all. Using just duckdb from the Windows cmd line: duckdb --csv -c "from 'beausoliel.csv' where datasetKey = '12345678' and kingdom = 'Animalia'" On Linux you may need different quoting. Commented Sep 15 at 13:11
  • Indeed! And I've used duckdb for this. I think in particular programming communities (especially people starting) it might be good to show this all in R. Hence the search for a function that does this. For this, compute_csv() does the job! Commented Sep 15 at 13:56

1 Answer 1

1

We can use dbplyr::sql_render() to generate the query and then construct a DuckDB copy statement to export the results to a CSV file.

library(duckdb) 
library(DBI)
library(dplyr)
library(dbplyr)

path_data_csv = "data.csv"
write.csv(iris, path_data_csv)  ## example dataset

data = duckdb::tbl_file(DBI::dbConnect(duckdb()), path_data_csv)
con <- DBI::dbConnect(duckdb())

data %>% 
  dplyr::filter(Sepal.Length > 4.5, 
         Species %in% c('virginica')) %>% 
  dbplyr::sql_render() %>% 
  as.character() %>% 
  paste0("COPY (", ., ") TO 'newdata.csv' (FORMAT CSV, HEADER)") %>% 
  DBI::dbExecute(con, .)
#> [1] 50

Or as Jon mentioned in the comments, load/install {duckplyr} to take advantage of its compute_csv() function to execute the query and write the results directly to CSV; it also returns a duckplyr_df object for potential further operations. Note that you need to convert to "duckplyr_df" using duckplyr::as_duckdb_tibble() first.

library(duckplyr)

data %>%  
  dplyr::filter(Sepal.Length > 4.5, 
         Species %in% c('virginica')) %>% 
  duckplyr::as_duckdb_tibble() %>% 
  duckplyr::compute_csv("newdata.csv",
                        options = list(header = TRUE, delimiter = ","))

#> # A duckplyr data frame: 6 variables
#>    column0 Sepal.Length Sepal.Width Petal.Length Petal.Width Species  
#>      <dbl>        <dbl>       <dbl>        <dbl>       <dbl> <chr>    
#>  1     101          6.3         3.3          6           2.5 virginica
#>  2     102          5.8         2.7          5.1         1.9 virginica
#>  3     103          7.1         3            5.9         2.1 virginica
#>  4     104          6.3         2.9          5.6         1.8 virginica
#>  5     105          6.5         3            5.8         2.2 virginica
#>  6     106          7.6         3            6.6         2.1 virginica
#>  7     107          4.9         2.5          4.5         1.7 virginica
#>  8     108          7.3         2.9          6.3         1.8 virginica
#>  9     109          6.7         2.5          5.8         1.8 virginica
#> 10     110          7.2         3.6          6.1         2.5 virginica
#> # ℹ more rows

Created on 2025-09-14 with reprex v2.1.1

Sign up to request clarification or add additional context in comments.

3 Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.