3

experts!

I encountered the following problem: when trying to read a large file from a PostgreSQL database in a Spring Boot application using the Spring Data JPA framework, all the data is loaded into memory, even though the Blob class specification says otherwise:

By default drivers, implement Blob using an SQL locator (BLOB), which means that a Blob object contains a logical pointer to the SQL BLOB data rather than the data itself.

Details below:

@RestController
class LargeFileController(
    val fileRepo: FileContentRepo
) {
    @PostMapping("/file", consumes = [MediaType.MULTIPART_FORM_DATA_VALUE])
    fun saveFile(@RequestBody file: MultipartFile) {
        fileRepo.save(LargeFileContent(file))
    }

    @GetMapping("/file/{id}", produces = [MediaType.APPLICATION_OCTET_STREAM_VALUE])
    @Transactional
    fun getFileById(@PathVariable("id") id: Long): InputStreamResource {
        val file = fileRepo.findById(id).orElseThrow()
        // Here I see that all the data is stored in memory (regardless of file size) 
        // in the InputStream buffer (in the screenshot below) 
        // If I turn off the database at this point, I still have access to the entire contents of the file.
        val content = file.content

        return InputStreamResource {
            file.content.binaryStream
        }
    }
}

@Entity
@Table(name = "file_content")
class LargeFileContent (
    @Column(nullable = false)
    @Lob
    @JdbcTypeCode(java.sql.Types.BINARY)
    val content: Blob,

    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    val id: Long?=null,
) {
    constructor(file: MultipartFile) : this(
        content = BlobProxy.generateProxy(file.inputStream, file.size)
    )
}

@Repository
interface FileContentRepo: JpaRepository<LargeFileContent, Long> {
}

All data saved in buf

My idea was to save the InputStream from the database to a temporary file on disk and then return the InputStream from that file to the client (so as not to hold up the database connection while the client reads). Is it possible to read the data from the Blob in chunks, rather than loading the entire contents into memory at once?

Thanks in advance for your answers!

2
  • 1
    PostreSQL's bytea is actually not a BLOB (at least in terms of Oracle DB), please check for example cybertec-postgresql.com/en/…: "when you read or write a bytea, all data have to be stored in memory (no streaming support)" Commented Nov 3 at 14:16
  • Thanks, Andrey! The resource you provided was truly helpful. I found a solution: instead of using bytea and Blob in my entities, I use the Large Object API. Details below. Commented Nov 5 at 14:51

3 Answers 3

2

I found a solution: I should use the Large Object API for PostgreSQL instead of using bytea and Blob in my entities. Thanks to Andrey, who provided a very useful resource in the comments.

Here's working code that allows you to get a stream from a database object. To avoid the "slow client" problem and connection pool exhaustion, the file from the database is read into a temporary file on disk and a stream from this file is returned to the client. After reading, the file will be deleted.

//the service level is deliberately omitted 
@RestController  
class LargeFileController(  
    private val fileRepo: LargeFileContentRepo,  
    private val dataSource: DataSource,  
) {  
  
    @PostMapping("large/file", consumes = [MediaType.MULTIPART_FORM_DATA_VALUE])  
    @Transactional //working with lage object requires active transaction  
    fun saveFile(@RequestBody file: MultipartFile) {  
        val conn = DataSourceUtils.getConnection(dataSource)  
        try {  
            val pgConn = conn.unwrap(org.postgresql.PGConnection::class.java)  
            val lobj = pgConn.largeObjectAPI  
  
            val oid = lobj.createLO(LargeObjectManager.WRITE)  
            val obj = lobj.open(oid, LargeObjectManager.WRITE)  
  
            file.inputStream.use { input ->  
                input.copyTo(obj.outputStream)  
            }  
  
            obj.close()  
  
            val entity = LargeFile(  
                oid = oid,  
                fileName = file.originalFilename,  
                size = file.size,  
                mimeType = file.contentType  
            )  
            fileRepo.save(entity)  
        } finally {  
            DataSourceUtils.releaseConnection(conn, dataSource)  
        }  
    }  
  
    @GetMapping("large/file/{id}", produces = [MediaType.APPLICATION_OCTET_STREAM_VALUE])  
    @Transactional  
    fun getFileById(@PathVariable("id") id: Long, response: HttpServletResponse): ResponseEntity<InputStreamResource> {  
        val file = fileRepo.findById(id).orElseThrow()  
        val tempFile = Files.createTempFile(UUID.randomUUID().toString(),file.fileName!!)  
        return try {  
            val conn: Connection = DataSourceUtils.getConnection(dataSource)  
            try {  
                val pgConn = conn.unwrap(org.postgresql.PGConnection::class.java)  
                val lobj = pgConn.largeObjectAPI  
  
                val obj = lobj.open(file.oid, LargeObjectManager.READ)  
                val inputStream = obj.inputStream  
  
                inputStream.use { lois ->  
                    tempFile.outputStream().use { fileOutputStream ->  
                        IOUtils.copy(lois, fileOutputStream)  
                    }  
                }  
                obj.close()  
  
                val contentDisposition = ContentDisposition.builder("attachment")  
                        .filename(file.fileName, StandardCharsets.UTF_8)  
                        .build()  
                ResponseEntity.ok()  
                    .header(HttpHeaders.CONTENT_DISPOSITION, contentDisposition.toString())  
                    .header(HttpHeaders.CONTENT_LENGTH, tempFile.fileSize().toString())  
                    .contentType(MediaType.APPLICATION_OCTET_STREAM)  
                    .body(InputStreamResource(tempFile.inputStream()))  
                //or  
  
                /*tempFile.inputStream().use { input ->                   
                 val contentDisposition = ContentDisposition.builder("attachment")                        .filename(file.fileName, StandardCharsets.UTF_8)                                       .build()                    
                 response.contentType = MediaType.MULTIPART_FORM_DATA_VALUE                             response.addHeader(HttpHeaders.CONTENT_DISPOSITION, contentDisposition.toString())  
                 response.outputStream.use { output ->                        
                    IOUtils.copy(input, output)                    
                 }                
                }*/            
            } finally {  
                DataSourceUtils.releaseConnection(conn, dataSource)  
            }  
        } finally {  
            try {  
                if (Files.exists(tempFile)) {  
                    Files.deleteIfExists(tempFile)  
                }  
            } catch (ex: Exception) {  
                log.warn("Failed to delete temp file: ${tempFile.toAbsolutePath()}", ex)  
            }  
        }  
    }  
  
    companion object {  
        private val log = LoggerFactory.getLogger(LargeFileController::class.java)  
    }  
}

Full example here

UPDATE:

My mistake was that I was trying to store a Blob in my table in a bytea column. All I had to do was change the column type in the database from bytea to bigint for this column. JPA works correctly with Blobs in this case (it stores the file's oid in the column).

Here is a complete correct example:


@Entity
@Table(name = "file_content")
class FileContent (
    //The field type in the database must be bigint, not bytea. This field will store the OID of the created file.
    @Lob
    val oid: Blob,

    val name: String?,

    val mimeType: String?,

    val size: Long,

    @OneToMany(cascade = [CascadeType.ALL], mappedBy = "file")
    @BatchSize(size = 20)
    val tags: List<FileTag>,

    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    val id: Long?=null,
) {
    constructor(file: MultipartFile) : this(
        oid = BlobProxy.generateProxy(file.inputStream, file.size),
        name = file.originalFilename!!,
        mimeType = file.contentType!!,
        size = file.size,
        tags = listOf()
    )
}

@Service
class FileService(
    private val fileRepo: FileRepo
) {

    @Transactional
    fun getFile(id: Long): Pair<FileContent, File> {
        val fileInfo = fileRepo.findById(id).orElseThrow { RuntimeException("File not found") }.content
        val largeObj = fileInfo.oid
        val tmpFile = Files.createTempFile(fileInfo.name, null)
        tmpFile.outputStream().use { out ->
            largeObj.binaryStream.use {
                it.copyTo(out)
            }
        }
        return fileInfo to tmpFile.toFile()
    }

    fun saveFile(file: MultipartFile): Long {
        return fileRepo.save(FileMetadata(file)).id!!
    }

}

@RestController
class BlobObjectController(
    val fileService: FileService
) {

    @GetMapping("/file/{id}", produces = [MediaType.APPLICATION_OCTET_STREAM_VALUE])
    fun getFileById(@PathVariable("id") id: Long): ResponseEntity<StreamingResponseBody> {
        val (metadata, tempFile) = fileService.getFile(id)
        val contentDisposition = ContentDisposition.builder("attachment")
            .filename(metadata.name, StandardCharsets.UTF_8)
            .build()
        val contentType = metadata.mimeType
            ?.let { runCatching { MediaType.valueOf(it) }.getOrNull() }
            ?: MediaType.APPLICATION_OCTET_STREAM

        val responseBody = StreamingResponseBody { outputStream ->
            tempFile.inputStream().use { input ->
                input.copyTo(outputStream)
            }

            val deleted = Files.deleteIfExists(tempFile.toPath())
            log.info("File ${if (deleted) "was deleted" else "WAS NOT DELETED"} after response sent: ${tempFile.name}")
        }
        return ResponseEntity.ok()
            .header(HttpHeaders.CONTENT_DISPOSITION, contentDisposition.toString())
            .contentLength(metadata.size)
            .contentType(contentType)
            .body(responseBody)

    }

    @PostMapping("/file", consumes = [MediaType.MULTIPART_FORM_DATA_VALUE])
    fun saveFile(@RequestBody file: MultipartFile) {
        fileService.saveFile(file)
    }

    companion object {
        private val log = LoggerFactory.getLogger(BlobObjectController::class.java)
    }
}

Sign up to request clarification or add additional context in comments.

Comments

1

I don't quite understand the reason behind saving large file in your database when you want to save it as temp file first and then process it. Why take the overhead of reading the file from database first? Just save it in files and read it from there. Furthermore I think it is possible to read large file from DB in chunks you may try this:

YourEntity entity = entityManager.find(YourEntity.class, id);
Blob blob = entity.yourBlobColumn();
try (InputStream in = blob.getBinaryStream()) {
    byte[] buffer = new byte[8192]; // 8 KB buffer
    int bytesRead;
    while ((bytesRead = in.read(buffer)) != -1) {
        // process or write chunk
    }
}

1 Comment

As I said earlier, unfortunately, I don’t have any other source for storing files other than the database. In your example, when you call blob.getBinaryStream(), the entire file content is stored in memory (I described this in the issue), so this solution does not work. Saving and reading files locally is also not a working option if you have more than one application instance - data consistency is not ensured, i.e. external storage is required. (my case). The idea of ​​storing data in a temporary file is a way to avoid the problem of a "slow-reading" client.
1

Looks like you have overingeneered something. There is no need to use PG API:

public class BlobTestIT {

    @Autowired
    private JdbcTemplate jdbcTemplate;

    @Test
    @Transactional
    void test_blob() throws Exception {

        byte[] bytes = new byte[1000];
        Arrays.fill(bytes, (byte) 'a');

        Path tempFile = Files.createTempFile("test_blob", ".bin");
        try (OutputStream os = Files.newOutputStream(tempFile)) {
            os.write(bytes);
        }

        Long blobId;
        try (InputStream is = Files.newInputStream(tempFile)) {
            blobId = jdbcTemplate.query("select ?", ps -> {
                ps.setBlob(1, BlobProxy.generateProxy(is, tempFile.toFile().length()));
            }, (rs, rowNum) -> rs.getLong(1)).get(0);

            assertThat(blobId).isNotNull();
        }


        Blob blob = jdbcTemplate.query("select ?", ps -> {
            ps.setLong(1, blobId);
        }, (rs, rowNum) -> rs.getBlob(1)).get(0);

        assertThat(blob).isNotNull();

        try (InputStream is = blob.getBinaryStream()) {
            byte[] read = IOUtils.toByteArray(is);
            assertThat(read)
                    .containsExactly(bytes);
        }
    }

}

1 Comment

This is a tricky (select query with simultaneous creation of a Blob object) and rather elegant solution. getBlob() and setBlob(blob) work with the Large Object API under the hood. Thanks a lot!

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.