{"id":154702,"date":"2019-09-03T09:57:58","date_gmt":"2019-09-03T13:57:58","guid":{"rendered":"https:\/\/www.countingpips.com\/?p=154702"},"modified":"2019-09-03T09:57:58","modified_gmt":"2019-09-03T13:57:58","slug":"10-essential-data-science-packages-for-python","status":"publish","type":"post","link":"https:\/\/www.investmacro.com\/forex\/2019\/09\/10-essential-data-science-packages-for-python\/","title":{"rendered":"10 Essential Data Science Packages for Python"},"content":{"rendered":"<div id=\"inves-1450159410\" class=\"inves-below-title-posts inves-entity-placement\"><div id =\"posts_date_custom\"><div align=\"left\">September 3, 2019<\/div><hr style=\"border: none; border-bottom: 3px solid black;\">\r\n<\/div><\/div><p><strong>By TJ Simmons for <a href=\"http:\/\/kite.com\" target=\"_blank\" rel=\"noopener noreferrer\"><strong>Kite.com<\/strong><\/a><\/strong><\/p>\n<div class=\"homepage__section\">\n<div class=\"homepage__section__content blog__content\">\n<div class=\"content-block\">\n<p>Interest in data science <a href=\"https:\/\/trends.google.com\/trends\/explore?date=2013-01-12%202019-04-19&amp;q=data%20science\">has risen remarkably<\/a> in the last five years. And while there are many programming languages suited for data science and machine learning, <a href=\"https:\/\/trends.google.com\/trends\/explore?date=2013-01-12%202019-04-19&amp;q=data%20science%20python,data%20science%20r,data%20science%20java,data%20science%20julia,data%20science%20sql\">Python is the most popular<\/a>.<\/p>\n<p>Since it\u2019s the language of choice for machine learning, here\u2019s a Python-centric roundup of ten essential data science packages, including the <a href=\"https:\/\/kite.com\/blog\/python\/python-machine-learning-libraries\/\">most popular machine learning packages<\/a>.<\/p>\n<h2>Scikit-Learn<\/h2>\n<p>Scikit-Learn is a Python module for machine learning built on top of SciPy and NumPy. David Cournapeau started it as a Google Summer of Code project. Since then, it\u2019s grown to over <a href=\"https:\/\/github.com\/scikit-learn\/scikit-learn\">20,000 commits<\/a> and more than 90 releases. Companies such as J.P. Morgan and Spotify use it in their data science work.<\/p>\n<p>Because Scikit-Learn has such a gentle learning curve, even the people on the business side of an organization can use it. For example, a <a href=\"https:\/\/scikit-learn.org\/stable\/auto_examples\/index.html#examples-based-on-real-world-datasets\">range of tutorials<\/a> on the Scikit-Learn website show you how to analyze real-world data sets. If you\u2019re a beginner and want to pick up a machine learning library, Scikit-Learn is the one to start with.<\/p>\n<p>Here\u2019s what it requires:<\/p>\n<ul>\n<li>Python 3.5 or higher.<\/li>\n<li>NumPy 1.11.0 or higher.<\/li>\n<li>SciPy 0.17.0 or higher.<\/li>\n<\/ul>\n<h2><b>PyTorch<\/b><\/h2>\n<p>PyTorch does two things very well. First, it accelerates tensor computation using strong GPU. Second, it builds dynamic neural networks on a tape-based autograd system, thus allowing reuse and greater performance. If you\u2019re an academic or an engineer who wants an easy-to-learn package to perform these two things, PyTorch is for you.<\/p>\n<p>PyTorch is excellent in specific cases. For instance, do you want to compute tensors faster by using a GPU, as I mentioned above? Use PyTorch because you can\u2019t do that with NumPy. Want to use RNN for language processing? Use PyTorch because of its define-by-run feature. Or do you want to use deep learning but you\u2019re just a beginner? Use PyTorch because Scikit-Learn doesn\u2019t cater to deep learning.<\/p>\n<p>Requirements for PyTorch depend on your operating system. The installation is slightly more complicated than, say, Scikit-Learn. I recommend using the <a href=\"https:\/\/pytorch.org\/get-started\/locally\/\">\u201cGet Started\u201d page<\/a> for guidance. It usually requires the following:<\/p>\n<ul>\n<li>Python 3.6 or higher.<\/li>\n<li>Conda 4.6.0 or higher.<\/li>\n<\/ul>\n<\/div>\n<div class=\"content-block\">\n<h2><b>Caffe<\/b><\/h2>\n<p>Caffe is one of the fastest implementations of a convolutional network, making it ideal for image recognition. It\u2019s best for\u00a0processing images.<\/p>\n<p><a href=\"http:\/\/daggerfs.com\/\">Yangqing Jia<\/a> started Caffe while working on his PhD at UC Berkeley. It\u2019s released under the <a href=\"https:\/\/github.com\/BVLC\/caffe\/blob\/master\/LICENSE\">BSD 2-Clause license<\/a>, and it\u2019s touted as one of the fastest-performing deep-learning frameworks out there. According to the <a href=\"http:\/\/caffe.berkeleyvision.org\">website<\/a>, Caffe\u2019s image processing is quite astounding. They claim it can process \u201c<a href=\"https:\/\/github.com\/Cassie94\/LSTD\/blob\/master\/docs\/index.md\">over 60M images per day with a single NVIDIA K40 GPU<\/a>.\u201d<\/p>\n<p>I should highlight that Caffe assumes you have at least a mid-level knowledge of machine learning, although the learning curve is still relatively gentle.<\/p>\n<p>As with PyTorch, requirements depend on your operating system. Check the installation guide <a href=\"http:\/\/caffe.berkeleyvision.org\/installation.html\">here<\/a>. I recommend using the Docker version if you can so it works right out of the box. The compulsory dependencies are below:<\/p>\n<ul>\n<li><a href=\"https:\/\/developer.nvidia.com\/cuda-zone\">CUDA<\/a> for GPU mode.\n<ul>\n<li>Library version 7 or higher and the latest driver version are recommended, but releases in the 6s are fine too.<\/li>\n<li>Versions 5.5 and 5.0 are compatible but considered legacy.<\/li>\n<\/ul>\n<\/li>\n<li><a href=\"http:\/\/en.wikipedia.org\/wiki\/Basic_Linear_Algebra_Subprograms\">BLAS<\/a> via ATLAS, MKL, or OpenBLAS.<\/li>\n<li><a href=\"http:\/\/www.boost.org\/\">Boost<\/a> 1.55 or higher.<\/li>\n<\/ul>\n<h2><b>TensorFlow<\/b><\/h2>\n<p>TensorFlow is one of the most famous machine learning libraries for some very good reasons. It specializes in numerical computation using dataflow graphs.<\/p>\n<p>Originally developed by Google Brain, TensorFlow is open sourced. It uses <a href=\"https:\/\/en.wikipedia.org\/wiki\/Dataflow_programming\">dataflow<\/a>\u00a0graphs and <a href=\"https:\/\/en.wikipedia.org\/wiki\/Differentiable_programming\">differentiable programming<\/a> across a range of tasks, making it one of the most highly flexible and powerful machine learning libraries ever created.<\/p>\n<p>If you need to process large data sets quickly, this is a library you shouldn\u2019t ignore.<\/p>\n<p>The most recent stable version is v1.13.1, but the new v2.0 is in beta now.<\/p>\n<h2><b>Theano<\/b><\/h2>\n<p>Theano is one of the earliest open-source software libraries for deep-learning development. It\u2019s best for high-speed computation.<\/p>\n<p>While Theano announced that it would stop major developments after the release of v1.0 in 2017, you can still study it for historical reasons. It\u2019s made this list of top ten data science packages for Python because if you familiarize yourself with it, you\u2019ll get a sense of how its innovations later evolved into the features you now see in competing libraries.<\/p>\n<h2><b>Pandas<\/b><\/h2>\n<p>Pandas is a powerful and flexible data analysis library written in Python. While not strictly a machine learning library, it\u2019s well-suited for data analysis and manipulation for large data sets. In particular, I enjoy using it for its data structures, such as the DataFrame, the time series manipulation and analysis, and the numerical data tables. Many business-side employees of large organizations and startups can easily pick up Pandas to perform analysis. Plus, it\u2019s fairly easy to learn, and it rivals competing libraries in terms of its features in data analysis.<\/p>\n<p>If you want to use Pandas, here\u2019s what you\u2019ll need:<\/p>\n<ul>\n<li><a href=\"https:\/\/setuptools.readthedocs.io\/en\/latest\/\">Setuptools<\/a> version 24.2.0 or higher.<\/li>\n<li><a href=\"http:\/\/www.numpy.org\/\">NumPy<\/a> version 1.12.0 or higher.<\/li>\n<li><a href=\"https:\/\/dateutil.readthedocs.io\/en\/stable\/\">Python dateutil <\/a>2.5.0 or higher.<\/li>\n<li><a href=\"http:\/\/pytz.sourceforge.net\/\">pytz<\/a> for cross-platform timezone calculations.<\/li>\n<\/ul>\n<h2><b>Keras<\/b><\/h2>\n<p>Keras is built for fast experimentation. It\u2019s capable of running on top of other frameworks like TensorFlow, too. Keras is best for easy and fast prototyping as a deep learning library.<\/p>\n<p>Keras is popular amongst deep learning library aficionados for its easy-to-use API. Jeff Hale created a compilation that ranked the major deep learning frameworks, and <a href=\"https:\/\/towardsdatascience.com\/deep-learning-framework-power-scores-2018-23607ddf297a\">Keras compares very well<\/a>.<\/p>\n<p>The only requirement for Keras is one of three possible backend engines, like TensorFlow, Theano, or <a href=\"https:\/\/docs.microsoft.com\/en-us\/cognitive-toolkit\/\">CNTK<\/a>.<\/p>\n<\/div>\n<div class=\"content-block\">\n<h2><b>NumPy<\/b><\/h2>\n<p>NumPy is the fundamental package needed for scientific computing with Python. It\u2019s an excellent choice for researchers who want an easy-to-use Python library for scientific computing. In fact, NumPy was designed for this purpose; it makes array computing a lot easier.<\/p>\n<p>Originally, the code for NumPy was part of SciPy. However, scientists who need to use the array object in their work were having to install the large SciPy package. To avoid that, a new package was separated from SciPy and called NumPy.<\/p>\n<p>If you want to use NumPy, you\u2019ll need Python 2.6.x, 2.7.x, 3.2.x, or newer.<\/p>\n<h2><b>Matplotlib<\/b><\/h2>\n<p>Matplotlib is a Python 2D plotting library that makes it easy to produce cross-platform charts and figures.<\/p>\n<p>So far in this roundup, we\u2019ve covered plenty of machine learning, deep learning, and even fast computational frameworks. But with data science, you also need to draw graphs and charts. When you talk about data science and Python, Matplotlib is what comes to mind for plotting and data visualization. It\u2019s\u00a0ideal for publication-quality charts and figures across platforms.<\/p>\n<p>For long-term support, the current stable version is v2.2.4, but you can get v3.0.3 for the latest features. It does require that you have Python 3 or newer, since support for Python 2 is being dropped.<\/p>\n<h2><b>SciPy<\/b><\/h2>\n<p>SciPy is a gigantic library of data science packages mainly focused on mathematics, science, and engineering. If you\u2019re a data scientist or engineer who wants the whole kitchen sink when it comes to running technical and scientific computing, you\u2019ve found your match with SciPy.<\/p>\n<p>Since it builds on top of NumPy, SciPy has the same target audience. It has a wide collection of sub packages, each focused on niches such as <a href=\"https:\/\/en.wikipedia.org\/wiki\/Fourier_transform\">Fourier transforms<\/a>, signal processing, optimizing algorithms, spatial algorithms, and nearest neighbor. Essentially, this is the companion Python library for your typical data scientist.<\/p>\n<p>As far as requirements go, you\u2019ll need NumPy if you want SciPy. But that\u2019s it.<\/p>\n<h2><b>Summary<\/b><\/h2>\n<p>This brings to an end my roundup of the 10 major data-science-related Python libraries. Is there something else you\u2019d like us to cover that also uses Python extensively? Let us know!<\/p>\n<p>And don\u2019t forget that <a href=\"https:\/\/kite.com\/\">Kite<\/a> can help you learn these packages faster with its ML-powered autocomplete as well as handy in-editor docs lookups. Check it out for free as an <a href=\"https:\/\/kite.com\/integrations\/\">IDE plugin<\/a> for any of the leading IDEs.<\/p>\n<\/div>\n<\/div>\n<p><em><strong>About the Author:<\/strong><\/em><\/p>\n<p><a href=\"https:\/\/kite.com\/blog\/python\/data-science-packages-python\/\" target=\"_blank\" rel=\"noopener noreferrer\">This article<\/a> originally appeared on <a href=\"https:\/\/kite.com\" target=\"_blank\" rel=\"noopener noreferrer\">Kite.com<\/a>.<\/p>\n<p>(Reprinted with permission)<\/p>\n<\/div>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>By TJ Simmons for Kite.com Interest in data science has risen remarkably in the last five years. And while there are many programming languages suited for data science and machine learning, Python is the most popular. Since it\u2019s the language of choice for machine learning, here\u2019s a Python-centric roundup of ten essential data science packages, [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":154703,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[],"tags":[],"class_list":["post-154702","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry"],"_links":{"self":[{"href":"https:\/\/www.investmacro.com\/forex\/wp-json\/wp\/v2\/posts\/154702","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.investmacro.com\/forex\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.investmacro.com\/forex\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.investmacro.com\/forex\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.investmacro.com\/forex\/wp-json\/wp\/v2\/comments?post=154702"}],"version-history":[{"count":1,"href":"https:\/\/www.investmacro.com\/forex\/wp-json\/wp\/v2\/posts\/154702\/revisions"}],"predecessor-version":[{"id":154704,"href":"https:\/\/www.investmacro.com\/forex\/wp-json\/wp\/v2\/posts\/154702\/revisions\/154704"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.investmacro.com\/forex\/wp-json\/wp\/v2\/media\/154703"}],"wp:attachment":[{"href":"https:\/\/www.investmacro.com\/forex\/wp-json\/wp\/v2\/media?parent=154702"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.investmacro.com\/forex\/wp-json\/wp\/v2\/categories?post=154702"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.investmacro.com\/forex\/wp-json\/wp\/v2\/tags?post=154702"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}