tl;dr: A list of useful resources aimed to self-publish a book on Amazon using Bookdown.
- Writing style
- Did I use any editor?
- How to create the book: Bookdown!
- Self-publishing on Amazon (Kindle and paperback)
- Costs and earnings
- Publishing outside Amazon: Gumroad
- Linking: B&W, color and Kindle on Amazon
- 11-tips to write good
Update Ago-28-2018: I published another blog post related to this one, which contains more technical aspects of Bookdown.
A friend of mine told me to write down all the details about self-publishing a book.
So here you go—a long post explaining almost all the things I have done and discovered.
First of all, my thanks go to Bookdown. This R package allows enthusiastic people to self-publish a book! Although the book is based on R language, this process can be applied to any kind of book.
Kaylen Sanders from OpenDataScience.com did a poetic review of it.
The two-year journey culminated with two paperback versions available on Amazon (Color and B&W), a Kindle version, an epub version, a PDF, and a website.
I didn't plan to write a book. Around six years ago, I started using R and, as with many programmers, my "personal" library with many shortcuts began to grow.
Then I thought that this library could help more people, so after arranging lots of things in the right place, I published the 'funModeling' package on CRAN in February 2016.
Google and the book R Packages from Hadley Wickham (April 2015) were incredibly useful. Don't hesitate to check out that book if you plan to write a package.
I deeply believe that when there is an explanation behind what we use and what we do, it changes the way we perceive the action. So, I started to document the funModeling package functions.
The documentation grew rapidly, and soon escaped from the original scope of the package to include general explanations of machine learning and data preparation, and then the first version of the book was born!
Two months after the release, I rewrote everything from scratch.
There are two key points here:
- We don’t always we have a clear goal—we just "walk" and that goal takes shape.
- The first version is not always the ultimate one; start now with the ideas you already have and let them grow.
I wanted to "write everything I know"—things that took me a lot of time to learn—and expose the concepts with examples, lots of examples, so the reader can check them and extract their own conclusions.
The other remarkable point was on 'how to interpret all the results.” I found that when someone explains the analytical thinking path, extracting different conclusions from the analysis; then the undersanding around the topic is boosted.
These two books are aligned with the last idea:
Data Mining: Concepts and Techniques 3rd Edition by Jiawei Han, Micheline Kamber and Jian Pei (2012)
Data Mining with R: Learning with Case Studies by Luís Torgo (2011)
Note: Data Science = Data Mining + some marketing ;) Nowadays: Data Mining = web scraping
Did I use any editor?
Nope, you can 100% self-publish a book on your own, with patience and the Amazon self-publishing service.
Editors can help in the book structure, proofreading, marketing, printing, and distribution among others. It saves time.
A friend of mine surprised me with a Facebook ad campaign for the book when I launched it. Except from that, all marketing was done by word-of-mouth and some posts in the Data Science Heroes Blog.
Would you share it? ;)
How to create the book: Bookdown!
This amazing R package provides all the processes to create Kindle and paperback editions.
Get started with the minimum reproducible example at: https://bookdown.org/yihui/bookdown
The Data Science Live Book was 100% done using R and RStudio.
Only Bookdown should be a BIG point in any "Why R?" list.
You should google all of these terms before starting: Latex, Yaml, Knitr, R markdown, Pandoc, GitBook. None of them are Pokémon.
Check the RStudio lessons on what is R Markdown: https://rmarkdown.rstudio.com/lesson-1.html
Self-publishing on Amazon
Amazon runs a program called Kindle Direct Publishing (KDP).
Publishing the paperback version
You upload the PDF and Amazon will print on demand. That's it. You don't have to invest any money to buy so many copies before the release. After you publish, if one person from the Antartida buys a copy, then it is printed and delivered.
There are other print-on-demand publishers, like lulu.com.
The quality is excellent in both, but the color one is stunning! I see how colors help us to understand. However, the printing costs are around four times higher for this version.
Amazon will check several layout points before approving the release.
Check the color version:
And one from the black and white:
Note the quality of the plots and code layout —pretty important in a programming book.
Publishing the Kindle version
Easier to publish than the paperback.
The Kindle version of the Data Science Live Book, here!
(Amazon is incredibly vast, from printing-publishing books to host deep learning processes in AWS. Someday, Amazon and Google will be countries.)
You won't become rich publishing books unless you have a catchy title, like "Fifty Shades of Data in Grey."
Costs and earnings
There are two royalty options: 35% or 70%. We always want the higher, right? Well, in the 70% range, the book price must be US$9.99 at the most.
Printing costs depend on several factors. On this page, you will find how costs/royalties are calculated as well as a "Printing cost calculator" excel file: https://kdp.amazon.com/en_US/help/topic/G201834340
Amazon royalties are around 40% of retail price.
Typically, royalties when using a publisher are around 8–12% of the retail price. Source here.
Having a publisher/editor may facilitate several of things, so don't opt out only because of the earnings.
Gumroad is a service that allows users to sell different types of files across the internet, e.g., music, videos, and data science books.
Gumroad provides a shopping cart and, after payment, the buyer automatically receives an email with the download link. It works really well! No one complains about the service. The pricing is affordable: "If you use the Free version of Gumroad, our fee is just 8.5% + 30 cents per transaction. If you get the Premium version of Gumroad for $10 (USD)/month, our fee is 3.5% + 30 cents per sale."
I started with the free version and then changed to premium.
One of the most useful features is that they allow embedding the payment form into your website. You can check mine here.
The other useful feature is name your price. The minimum price to download the Data Science Live Book is US$5 and the buyer gets all three versions: PDF, .mobi, and .epub.
While I was writing this post, I saw that 37% of buyers spent more than the minimum—I'm happy you like the project!
This is a list of unique buyers’ countries that bought using Gumroad—so it works worldwide.
Did you find the outlier? :P
Try always to share the book before its release.
Proofreading is needed at two levels: technical and grammatical.
Regarding the technical aspects, the proofreading was mainly done by Pablo Seibelt, Head of Data in Auth0. I have also made some changes based on people’s feedback. (Thanks!).
Regarding the grammar check, I hired several English teachers, to finally keep with one outstanding freelancer, Dr. Candy Pettus, from www.fiverr.com (a site to hire freelancers).
The tree was generated by an iterative and short algorithm. Representing that simplicity is the seed of complexity. Like fractals. Like the Lorenz Attractor.
And like nature itself...