Get Informed out of Data

Full width home advertisement



Post Page Advertisement [Top]

Basic DataScience Packages in Julia to get started

Basic DataScience Packages in Julia to get started

1. IJulia             enables the use of Jupyter notebooks or JupyterLab. This is a helpful environment for programming in as well as creating reports and outputting them to HTML, MD, PDF, etc. It also provides options for other Julia kernels. Documentation: 2. DataFrames                 Julia's answer to Pandas in Python, or tidyr/dplyr in R's tidyverse. It provides the DataFrame object, which will be the basis for much data analysis and wrangling, and provides functionalities for selecting columns, filtering rows, sorting datasets, creating new variables, joins, converting datasets from wide to long, etc. You can use the CSV package to read in datasets or create them yourself. It also has functions inspired by Hadley Wickham's Split-Apply-Combine approach and a very helpful describe() function. Documentation: 3. Plots         a very basic and easy-to-use visualization library which can be thought as an interpreter for various other plotting libraries. It supports various different backends, most notably Plotly. It is very customizable, offering options for layouts, colors, attributes, and objects. Note there are also "recipes" (extensions of the Plots framework) that enable Plots to perform different plot commands, use different functions, and handle different data types. Documentation: 4. VegaLite                     visualization library for Julia, even moreso than the Gadfly library. It functions through a grammar of graphics framework, with core macro @vlplot. Documentation: 5. RCall             As the name suggests, the RCall package enables the use of R code in Julia, either from the Juno REPL or from Jupyter. It is particularly helpful because objects can be created using R and passed to Julia functions or vice versa. Documentation: 6. Distributions             can be used for creating statistical distribution objects as well as sampling from them. This includes the Normal, Exponential, Uniform, Binomial, Gamma distributions and more. Another very helpful feature is finding the best fit from a theoretical distribution using the empirical distribution. Documentation: 7. PrettyTables                     can be used for formatting tables, using either text, HTML, or LaTeX backends. It is also customizable for options like alignment, printing rows satisfying certain conditions, etc. Documentation: 8. GLM         The GLM package is helpful for creating either a linear regression model with extractable methods (R2, estimates of coefficients, etc.) or other generalized linear models. Documentation: 9. ScikitLearn             The ScikitLearn package from Python has an implementation in Julia, and it is just as useful there, working quite similar but also offering new Julia based methods on top of standard Python methods. Types of models include supervised learning, unsupervised learning, and dataset transformations; the package also offers capabilities for cross-validation, tuning hyperparameters, etc. Documentation: 10. Flux             Flux is a Julia package for machine learning and deep learning needs. This provides a lot of flexibility, utilizing a key feature of taking gradients of other Julia code. Features include defining loss functions and gradient descent, building layers of models, regularization, and training models. This is a fairly technical package but comes with a repository called the "model zoo" which does a nice job showcasing the package's capabilities. Documentation: Model Zoo:

No comments:

Post a Comment

Bottom Ad [Post Page]