2

I'm working on a document which incorporates a lot of figures and tables etc. The PDF file size is creeping up and up, to the point that github complains about it now too.

I've already got a list of the largest images etc (from ls) which are the likely culprits of contributing to the file size. There are other elements in the document too however, such as To-Do notes, and on-the-fly typeset images a-la Tikz and it's ilk, which can't be calculated in the same way.

Is there an easy way to tell what is contributing to making a TeX PDF file large? Is there anything in the auxiliary files?

17
  • 1
    Don't you mean 'PDF file' in the last sentence? Commented May 15, 2018 at 17:29
  • 3
    Not an answer: Why do you push the reproducible results of the typesetting to github? If you include all the source separately there is no need to also push the PDF. Most likely your images are the biggest culprits. To-Do notes won't take up much as they should be only text. The results of TikZ shouldn't be as big as a pixel graphic either (this of course highly depends on the number of paths in your TikZ foo). Commented May 15, 2018 at 17:29
  • 1
    You can get a rough estimate on how much your pictures increase the size of your PDF by using the the draft option for the graphicx package. Commented May 15, 2018 at 21:28
  • I’m using GitHub as a backup of the whole project, not just a source code repo. Having the pdf also backed up and available online is useful for sharing the document with others who are not tex-savvy. That’s by-the-by anyway as it doesn’t really affect the outcome of this question Commented May 16, 2018 at 7:50
  • 1
    If you want to keep your file size low, don't crop pictures using graphicx cropping methods but crop the files (might be good keeping the originals as backups). Also scale their resolution down to a resolution appropriate for their printing size (no need to include an image at 900 dpi if it gets printed with half of its natural size, 150-300dpi seem enough here). The human eye has its limitations on the resolution. Commented May 16, 2018 at 11:26

1 Answer 1

2

I don't know what your project is composed of, however usually the culprits are the images. As suggested in the comments, you could check how much they contribute by simply compile the .tex file using the draft option.

The source files of TikZ pictures usually don't take too much space but they can when compiled. I always suggest to separate the TikZ pictures from the main file and include them in your document through standalone package in the parent file and standalone class in the child file. In this way you have the possibility to compile single TikZ picture and see how large they are when compiled (of course this is not the only advantage of using standalone).

You said in the comments that you don't use your github repo as a source code repo but as a backup repo but the purpouse of a github repo is exactly to be a source repo. However, even if you use it as a backup, there is no need files created by the compilation, just add them to .gitignore and add a readme with the instruction for the correct compilation. Different thing is the final pdf, you can keep it to share with others.

6
  • Yep I realise that I'm 'abusing' Github a little, however it means I can backup by whole document with a single command pretty much (beats the hell out of copying to a backup drive or something manually). I don't actually have much in the way of Tikz pictures, but I am using the TeXShade package, which takes a while to compile, but I suspect may not be increasing the size much (though I'd like some way of knowing this for sure, if there is one). Commented May 16, 2018 at 9:49
  • As I said, by simply add the .aux, .idx, etc files to .gitignore you can still backup with only a single command. Simply the folder is more clean. It's a good practice, when your document is a book, a thesis, or even an article or a report of a few pages, to structure it as a modular document. It has several advantages, one of them is to know of each part contribute to the total size. However you could achieve this feature only using the standalone solution, with input and include you have other benefits but not this one. Commented May 16, 2018 at 9:57
  • If you are not familiar with modular document read it on wikipedia and read on tex.stackexchange the difference between \input and \include Commented May 16, 2018 at 9:59
  • The document is already modular (it is a thesis with various \input chapters). I don't believe there is anything wrong with how I have built the document. Ignoring the auxiliary files doesn't answer the question though. I'm not concerned by the overall size of the project/repo, I'm interested in ways of keeping the final PDF size down, without having to compress after the fact for instance. I figure it might be a generally useful process if one is able to identify unambiguously the major contributors to the document size (then I could go and compress them some more individually say) Commented May 16, 2018 at 10:07
  • My answer is more focused on how to check which part contribute more to the size of the final pdf. I you want a stupid and obvious answer to your question (I'm saying that the answer is stupid, not your question) is simply to don't insert high resolution images. Since, for large documents, the only culprits possible are the pictures (tikz, png, jpg, eps, or whatever). Commented May 16, 2018 at 10:12

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.