数据分析常用网站 持续更新!!!

数据分析常用网站
欢迎大家补充,直接在下面留言就可以了。不限于R,excel,sql,欢迎Python学者和统计学学者。
日后会陆续贴出一些有大数据分析项目的比赛,欢迎组队

大数据比赛

赛事公告

R语言

基石


请大家关注 <https://github.com/qinwf/awesome-R>  这是github上很多人一直维护的(外国网站和github居多,但是很有用!!!)

Awesome R 转载于https://github.com/qinwf/awesome-R

Awesome

A curated list of awesome R packages and tools. Inspired by awesome-machine-learning.

For better navigation, see https://awesome-r.com

heart for [Top 50](https://github.com/rstudio/RStartHere/blob/master/top_downloads_2016/top_packages) CRAN downloaded packages or repos with 400+ star

Integrated Development Environments

Integrated Development Environment

  • RStudio heart - A powerful and productive user interface for R. Works great on Windows, Mac, and Linux.
  • Emacs + ESS - Emacs Speaks Statistics is an add-on package for emacs text editors.
  • Sublime Text + R-Box - Add-on package for Sublime Text 2/3.
  • TextMate + r.tmblundle - Add-on package for TextMate 1/2.
  • StatET - An Eclipse based IDE for R.
  • Revolution R Enterprise - Revolution R would be offered free to academic users and commercial software would focus on big data, large scale multiprocessor functionality.
  • R Commander - A package that provides a basic graphical user interface.
  • IRkernel heart - R kernel for Jupyter.
  • Deducer - A Menu driven data analysis GUI with a spreadsheet like data editor.
  • Radiant - A platform-independent browser-based interface for business analytics in R, based on the Shiny.
  • Vim-R - Vim plugin for R.
  • Nvim-R - Neovim plugin for R.
  • JASP - A complete package for both Bayesian and Frequentist methods, that is familiar to users of SPSS.
  • Bio7 - A IDE contains tools for model creation, scientific image analysis and statistical analysis for ecological modelling.
  • RTVS - R Tools for Visual Studio.

Syntax

Packages change the way you use R.

  • magrittr heart - Let's pipe it.
  • pipeR - Multi-paradigm Pipeline Implementation.
  • lambda.r - Functional programming and simple pattern matching in R.
  • purrr - A FP package for R in the spirit of underscore.js.

Data Manipulation

Packages for cooking data.

  • dplyr heart - Fast data frames manipulation and database query.
  • data.table heart - Fast data manipulation in a short and flexible syntax.
  • reshape2 heart - Flexible rearrange, reshape and aggregate data.
  • readr - A fast and friendly way to read tabular data into R.
  • haven - Improved methods to import SPSS, Stata and SAS files in R.
  • tidyr - Easily tidy data with spread and gather functions.
  • broom - Convert statistical analysis objects into tidy data frames.
  • rlist - A toolbox for non-tabular data manipulation with lists.
  • jsonlite - A robust and quick way to parse JSON files in R.
  • ff - Data structures designed to store large datasets.
  • lubridate - A set of functions to work with dates and times.
  • stringi heart - ICU based string processing package.
  • stringr heart - Consistent API for string processing, built on top of stringi.
  • bigmemory - Shared memory and memory-mapped matrices. The big* pacakges provide additional tools including linear models (biglm) and Random Forests (bigrf).
  • fuzzyjoin - Join tables together on inexact matching.

Graphic Displays

Packages for showing data.

  • ggplot2 heart - An implementation of the Grammar of Graphics.
  • ggfortify - A unified interface to ggplot2 popular statistical packages using one line of code.
  • ggrepel - Repel overlapping text labels away from each other.
  • ggalt - Extra Coordinate Systems, Geoms and Statistical Transformations for ggplot2.
  • ggplot2 Extensions - Showcases of ggplot2 extensions.
  • lattice - A powerful and elegant high-level data visualization system.
  • rgl - 3D visualization device system for R.
  • Cairo - R graphics device using cairo graphics library for creating high-quality display output.
  • extrafont - Tools for using fonts in R graphics.
  • showtext - Enable R graphics device to show text using system fonts.
  • animation - A simple way to produce animated graphics in R, using ImageMagick.
  • gganimate - Create easy animations with ggplot2.
  • misc3d - Powerful functions to deal with 3d plots, isosurfaces, etc.
  • xkcd - Use xkcd style in graphs.
  • imager - An image processing package based on CImg library to work with images and display them.

HTML Widgets

Packages for interactive visualizations.

  • d3heatmap - Interactive heatmaps with D3.
  • DataTables - Displays R matrices or data frames as interactive HTML tables.
  • DiagrammeR heart - Create JS graph diagrams and flowcharts in R.
  • dygraphs - Charting time-series data in R.
  • formattable - Formattable Data Structures.
  • ggvis - Interactive grammar of graphics for R.
  • Leaflet - One of the most popular JavaScript libraries interactive maps.
  • MetricsGraphics - Enables easy creation of D3 scatterplots, line charts, and histograms.
  • networkD3 - D3 JavaScript Network Graphs from R.
  • scatterD3 - Interactive scatterplots with D3.
  • plotly heart - Interactive ggplot2 and Shiny plotting with plot.ly.
  • rCharts heart - Interactive JS Charts from R.
  • rbokeh - R Interface to Bokeh.
  • threejs - Interactive 3D scatter plots and globes.

Reproducible Research

Packages for literate programming.

  • knitr heart - Easy dynamic report generation in R.
  • xtable - Export tables to LaTeX or HTML.
  • rapport - An R templating system.
  • rmarkdown heart - Dynamic documents for R.
  • slidify - Generate reproducible html5 slides from R markdown.
  • Sweave - A package designed to write LaTeX reports using R.
  • texreg - Formatting statistical models in LaTex and HTML.
  • checkpoint - Install packages from snapshots on the checkpoint server.
  • brew - Pre-compute data to enhance your report templates. Can be combined with knitr.
  • ReporteRs - An R package to generate Microsoft Word, Microsoft PowerPoint and HTML reports.
  • bookdown - Authoring Books with R Markdown.

Web Technologies and Services

Packages to surf the web.

  • Web Technologies List - Information about how to use R and the world wide web together.
  • shiny heart - Easy interactive web applications with R.
  • RCurl - General network (HTTP/FTP/...) client interface for R.
  • httr heart - User-friendly RCurl wrapper.
  • httpuv - HTTP and WebSocket server library.
  • XML heart - Tools for parsing and generating XML within R.
  • rvest heart - Simple web scraping for R, using CSSSelect or XPath syntax.
  • OpenCPU heart - HTTP API for R.
  • Rfacebook - Access to Facebook API via R.
  • RSiteCatalyst - R client library for the Adobe Analytics.

Parallel Computing

Packages for parallel computing.

  • parallel - R started with release 2.14.0 which includes a new package parallel incorporating (slightly revised) copies of packages multicore and snow.
  • Rmpi - Rmpi provides an interface (wrapper) to MPI APIs. It also provides interactive R slave environment.
  • foreach heart - Executing the loop in parallel.
  • SparkR heart - R frontend for Spark.
  • DistributedR - A scalable high-performance platform from HP Vertica Analytics Team.
  • ddR - Provides distributed data structures and simplifies distributed computing in R.

High Performance

Packages for making R faster.

  • Rcpp heart - Rcpp provides a powerful API on top of R, make function in R extremely faster.
  • Rcpp11 - Rcpp11 is a complete redesign of Rcpp, targetting C++11.
  • compiler - speeding up your R code using the JIT

Language API

Packages for other languages.

  • rJava - Low-level R to Java interface.
  • jvmr - Integration of R, Java, and Scala.
  • rJython - R interface to Python via Jython.
  • rPython - Package allowing R to call Python.
  • runr - Run Julia and Bash from R.
  • RJulia - R package Call Julia.
  • RinRuby - a Ruby library that integrates the R interpreter in Ruby.
  • R.matlab - Read and write of MAT files together with R-to-MATLAB connectivity.
  • RcppOctave - Seamless Interface to Octave and Matlab.
  • RSPerl - A bidirectional interface for calling R from Perl and Perl from R.
  • V8 - Embedded JavaScript Engine.
  • htmlwidgets - Bring the best of JavaScript data visualization to R.
  • rpy2 - Python interface for R.

Database Management

Packages for managing data.

  • RODBC - ODBC database access for R.
  • DBI - Defines a common interface between the R and database management systems.
  • elastic - Wrapper for the Elasticsearch HTTP API
  • mongolite - Streaming Mongo Client for R
  • RMySQL - R interface to the MySQL database.
  • ROracle - OCI based Oracle database interface for R.
  • RPostgreSQL - R interface to the PostgreSQL database system.
  • RSQLite - SQLite interface for R
  • RJDBC - Provides access to databases through the JDBC interface.
  • rmongodb - R driver for MongoDB.
  • rredis - Redis client for R.
  • RCassandra - Direct interface (not Java) to the most basic functionality of Apache Cassanda.
  • RHive - R extension facilitating distributed computing via Apache Hive.
  • RNeo4j - Neo4j graph database driver.

Machine Learning

Packages for making R cleverer.

  • AnomalyDetection heart - AnomalyDetection R package from Twitter.
  • ahaz - Regularization for semiparametric additive hazards regression.
  • arules - Mining Association Rules and Frequent Itemsets
  • bigrf - Big Random Forests: Classification and Regression Forests for
    Large Data Sets
  • bigRR - Generalized Ridge Regression (with special advantage for p >> n
    cases)
  • bmrm - Bundle Methods for Regularized Risk Minimization Package
  • Boruta - A wrapper algorithm for all-relevant feature selection
  • BreakoutDetection - Breakout Detection via Robust E-Statistics from Twitter.
  • bst - Gradient Boosting
  • CausalImpact - Causal inference using Bayesian structural time-series models.
  • C50 - C5.0 Decision Trees and Rule-Based Models
  • caret heart - Classification and Regression Training
  • Clever Algorithms For Machine Learning
  • CORElearn - Classification, regression, feature evaluation and ordinal
    evaluation
  • CoxBoost - Cox models by likelihood based boosting for a single survival
    endpoint or competing risks
  • Cubist - Rule- and Instance-Based Regression Modeling
  • e1071 - Misc Functions of the Department of Statistics (e1071), TU Wien
  • earth - Multivariate Adaptive Regression Spline Models
  • elasticnet - Elastic-Net for Sparse Estimation and Sparse PCA
  • ElemStatLearn - Data sets, functions and examples from the book: "The Elements
    of Statistical Learning, Data Mining, Inference, and
    Prediction" by Trevor Hastie, Robert Tibshirani and Jerome
    Friedman
  • evtree - Evolutionary Learning of Globally Optimal Trees
  • FSelector - A feature selection framework, based on subset-search or feature ranking approches.
  • frbs - Fuzzy Rule-based Systems for Classification and Regression Tasks
  • GAMBoost - Generalized linear and additive models by likelihood based
    boosting
  • gamboostLSS - Boosting Methods for GAMLSS
  • gbm - Generalized Boosted Regression Models
  • glmnet heart - Lasso and elastic-net regularized generalized linear models
  • glmpath - L1 Regularization Path for Generalized Linear Models and Cox
    Proportional Hazards Model
  • GMMBoost - Likelihood-based Boosting for Generalized mixed models
  • grplasso - Fitting user specified models with Group Lasso penalty
  • grpreg - Regularization paths for regression models with grouped
    covariates
  • h2o heart - Deeplearning, Random forests, GBM, KMeans, PCA, GLM
  • hda - Heteroscedastic Discriminant Analysis
  • ipred - Improved Predictors
  • kernlab - kernlab: Kernel-based Machine Learning Lab
  • klaR - Classification and visualization
  • kohonen - Supervised and Unsupervised Self-Organising Maps.
  • lars - Least Angle Regression, Lasso and Forward Stagewise
  • lasso2 - L1 constrained estimation aka ‘lasso’
  • LiblineaR - Linear Predictive Models Based On The Liblinear C/C++ Library
  • lme4 heart - Mixed-effects models
  • LogicReg - Logic Regression
  • maptree - Mapping, pruning, and graphing tree models
  • mboost - Model-Based Boosting
  • Machine Learning For Hackers
  • mvpart - Multivariate partitioning
  • MXNet heart - MXNet brings flexible and efficient GPU computing and state-of-art deep learning to R.
  • ncvreg - Regularization paths for SCAD- and MCP-penalized regression
    models
  • nnet - eed-forward Neural Networks and Multinomial Log-Linear Models
  • oblique.tree - Oblique Trees for Classification Data
  • pamr - Pam: prediction analysis for microarrays
  • party - A Laboratory for Recursive Partytioning
  • partykit - A Toolkit for Recursive Partytioning
  • penalized - L1 (lasso and fused lasso) and L2 (ridge) penalized estimation
    in GLMs and in the Cox model
  • penalizedLDA - Penalized classification using Fisher's linear discriminant
  • penalizedSVM - Feature Selection SVM using penalty functions
  • quantregForest - quantregForest: Quantile Regression Forests
  • randomForest - randomForest: Breiman and Cutler's random forests for classification and regression.
  • randomForestSRC - randomForestSRC: Random Forests for Survival, Regression and Classification (RF-SRC).
  • rattle - Graphical user interface for data mining in R.
  • rda - Shrunken Centroids Regularized Discriminant Analysis
  • rdetools - Relevant Dimension Estimation (RDE) in Feature Spaces
  • REEMtree - Regression Trees with Random Effects for Longitudinal (Panel)
    Data
  • relaxo - Relaxed Lasso
  • rgenoud - R version of GENetic Optimization Using Derivatives
  • rgp - R genetic programming framework
  • Rmalschains - Continuous Optimization using Memetic Algorithms with Local
    Search Chains (MA-LS-Chains) in R
  • rminer - Simpler use of data mining methods (e.g. NN and SVM) in
    classification and regression
  • ROCR - Visualizing the performance of scoring classifiers
  • RoughSets - Data Analysis Using Rough Set and Fuzzy Rough Set Theories
  • rpart - Recursive Partitioning and Regression Trees
  • RPMM - Recursively Partitioned Mixture Model
  • RSNNS - Neural Networks in R using the Stuttgart Neural Network
    Simulator (SNNS)
  • RWeka - R/Weka interface
  • RXshrink - RXshrink: Maximum Likelihood Shrinkage via Generalized Ridge or Least
    Angle Regression
  • sda - Shrinkage Discriminant Analysis and CAT Score Variable Selection
  • SDDA - Stepwise Diagonal Discriminant Analysis
  • SuperLearner and subsemble - Multi-algorithm ensemble learning packages.
  • svmpath - svmpath: the SVM Path algorithm
  • tgp - Bayesian treed Gaussian process models
  • tree - Classification and regression trees
  • varSelRF - Variable selection using random forests
  • xgboost heart - eXtreme Gradient Boosting Tree model, well known for its speed and performance.

Natural Language Processing

Packages for Natural Language Processing.

  • text2vec - Fast Text Mining Framework for Vectorization and Word Embeddings.
  • tm - A comprehensive text mining framework for R.
  • openNLP - Apache OpenNLP Tools Interface.
  • koRpus - An R Package for Text Analysis.
  • zipfR - Statistical models for word frequency distributions.
  • NLP - Basic functions for Natural Language Processing.
  • LDAvis - Interactive visualization of topic models.
  • topicmodels - Topic modeling interface to the C code developed by by David M. Blei for Topic Modeling (Latent Dirichlet Allocation (LDA), and Correlated Topics Models (CTM)).
  • syuzhet - Extracts sentiment from text using three different sentiment dictionaries.
  • SnowballC - Snowball stemmers based on the C libstemmer UTF-8 library.
  • quanteda - R functions for Quantitative Analysis of Textual Data.
  • Topic Models Resources - Topic Models learning and R related resources.
  • NLP for :cn: - NLP related resources in R. @Chinese

Bayesian

Packages for Bayesian Inference.

  • coda - Output analysis and diagnostics for MCMC.
  • mcmc - Markov Chain Monte Carlo.
  • MCMCpack - Markov chain Monte Carlo (MCMC) Package.
  • R2WinBUGS - Running WinBUGS and OpenBUGS from R / S-PLUS.
  • BRugs - R interface to the OpenBUGS MCMC software.
  • rjags - R interface to the JAGS MCMC library.
  • rstan heart - R interface to the Stan MCMC software.

Optimization

Packages for Optimization.

  • minqa - Derivative-free optimization algorithms by quadratic approximation.
  • nloptr - NLopt is a free/open-source library for nonlinear optimization.
  • lpSolve - Interface to Lp_solve to Solve Linear/Integer Programs.

Finance

Packages for dealing with money.

  • quantmod heart - Quantitative Financial Modelling & Trading Framework for R.
  • TTR - Functions and data to construct technical trading rules with R.
  • PerformanceAnalytics - Econometric tools for performance and risk analysis.
  • zoo heart - S3 Infrastructure for Regular and Irregular Time Series.
  • xts - eXtensible Time Series.
  • tseries - Time series analysis and computational finance.
  • fAssets - Analysing and Modelling Financial Assets.

Bioinformatics

Packages for processing biological datasets.

  • Bioconductor heart - Tools for the analysis and comprehension of high-throughput genomic data.
  • genetics - Classes and methods for handling genetic data.
  • gap - An integrated package for genetic data analysis of both population and family data.
  • ape - Analyses of Phylogenetics and Evolution.
  • pheatmap - Pretty heatmaps made easy.

Network Analysis

Packages to construct, analyze and visualize network data.

  • Network Analysis List - Network Analysis related resources.
  • igraph heart - A collection of network analysis tools.
  • network - Basic tools to manipulate relational data in R.
  • sna - Basic network measures and visualization tools.
  • networkDynamic - Support for dynamic, (inter)temporal networks.
  • ndtv - Tools to construct animated visualizations of dynamic network data in various formats.
  • statnet - The project behind many R network analysis packages.
  • ergm - Exponential random graph models in R.
  • latentnet - Latent position and cluster models for network objects.
  • tnet - Network measures for weighted, two-mode and longitudinal networks.
  • rgexf - Export network objects from R to GEXF, for manipulation with network software like Gephi or Sigma.
  • visNetwork - Using vis.js library for network visualization.

R Development

Packages for packages.

  • Package Development List - R packages to improve package development.
  • devtools heart - Tools to make an R developer's life easier.
  • testthat heart - An R package to make testing fun.
  • R6 heart - simpler, faster, lighter-weight alternative to R's built-in classes.
  • pryr heart - Make it easier to understand what's going on in R.
  • roxygen heart - Describe your functions in comments next to their definitions.
  • lineprof - Visualise line profiling results in R.
  • packrat - Make your R projects more isolated, portable, and reproducible.
  • installr - Functions for installing softwares from within R (for Windows).
  • import - An import mechanism for R.
  • Rocker heart - R configurations for Docker.
  • RStudio Addins - List of RStudio addins.
  • drat - Creation and use of R repositories on GitHub or other repos.
  • covr - Test coverage for your R package and (optionally) upload the results to coveralls or codecov.
  • lintr - Static code analysis for R to enforce code style.
  • staticdocs - Generate static html documentation for an R package.

Logging

Packages for Logging

  • futile.logger - A logging package in R similar to log4j
  • log4r - A log4j derivative for R
  • logging - A logging package emulating the python logging package.

Other Tools

Handy Tools for R

  • git2r - Gives you programmatic access to Git repositories from R.

Other Interpreters

Alternative R engines.

  • CXXR - Refactorising R into C++.
  • fastR - FastR is an implementation of the R Language in Java atop Truffle and Graal.
  • incanter - Clojure-based, R-like statistical computing and graphics environment for the JVM with Lisp spirit.
  • pqR - a "pretty quick" implementation of R
  • renjin - a JVM-based interpreter for R.
  • rho - Refactor the interpreter of the R language into a fully-compatible, efficient, VM for R.
  • riposte - a fast interpreter and JIT for R.
  • RRO - Revolution R Open.
  • TERR - TIBCO Enterprise Runtime for R.

Learning R

Packages for Learning R.

  • swirl - An interactive R tutorial directly in your R console.
  • DataScienceR - a list of R tutorials for Data Science, NLP and Machine Learning.

Resources

Where to discover new R-esources.

Websites

Books

  • R Books List - List of R Books.
  • The Art of R Programming - It's a good resource for systematically learning fundamentals such as types of objects, control statements, variable scope, classes and debugging in R.
  • Free Books - CRAN Contributed Documentation in many languages.
  • R Cookbook - A quick and simple introduction to conducting many common statistical tasks with R.
  • Books written as part of the Johns Hopkins Data Science Specialization:
  • R Packages - A book (in paper and website formats) on writing R packages.
  • R in Action - This book aims at all levels of users, with sections for beginning, intermediate and advanced R ranging from "Exploring R data structures" to running regressions and conducting factor analyses.
  • Use R! - This series of inexpensive and focused books from Springer publish shorter books aimed at practitioners. Books can discuss the use of R in a particular subject area, such as Bayesian networks, ggplot2 and Rcpp.
  • R for SAS and SPSS users - An excelllent resource for users already familiar with SAS or SPSS.
  • An Introduction to R - A very good introductory text on R, also covers some advanced topics.
  • Introduction to Statistical Learning with Application in R - A simplified and "operational" version of The Elements of Statistical Learning. Free softcopy provided by its authors.
  • The R Inferno - Patrick Burns gives insight into R's ins and outs along with its quirks!

Podcasts

Reference Cards

MOOCs

Massive open online courses.

Lists

Great resources for learning domain knowledge.

Other Awesome Lists

excel

posted @ 2016-05-24 19:19  li_volleyball  阅读(1574)  评论(0编辑  收藏  举报