Here at JobTeaser, we found ourselves duplicating a great amount of code between text cleaning operations for machine learning projects, AWS wrapper, and Kafka helper.
The solution we found to this trouble was to extract our reusable code into proper packages (correctly tested and documented), before pushing it to our own package server.
The current solution relies on an EC2 machine that contains a pypi server. The CI (here CircleCi) will build→test →push a new version of the package to the pypi server.
Once published, other projects can use the package in their CI process by requesting it from pypi server.
- Can’t have default read-only for everyone and an authenticated user at the same time.
- Needs fine access control at user level because CircleCI doesn’t have fixed IPs.
- Sometimes the pypi server hangs for no reason and we have to restart it.
- All the data is stored on the server without any backup!
Because our infra already runs on AWS, this search was limited to solutions that used S3 as a backend, after some time we found two solutions:
As you can see the strongest point of the S3pypi solution is that it is straightforward.
It effectively uses a Cloudfront to expose s3 content in the same way that a static website.
But at Jobteaser, we manage all our AWS resource using Terraform. So we would have to take their templating and adapt it, not to mention that the security relies only on the Cloudfront setup.
- Setup doesn’t integrate easily with our tooling (Terraform)
- Doesn’t rely on user level access control
On the other hand we have Pypicloud that goes in the opposite direction with a more complex and modular solution.
Similarly to the current pypi solution, it requires a dedicated server but also has the added need for external storage and a caching service. But on the bright side, you can actually use any option you want for these components.
- Modular solution
- Has users and more configuration than pypi
- Nice GUI for admin
- Needs a more complex setup
- Costs are slightly higher
- Users need to be created from the admin interface