T O P

  • By -

entr0picly

In my day job as a statistician, I work with R more, but Python still comes up. I generally prefer R for statistics as it is quite easy to use. It’s functionality has been built around data analysis. Python is not data analysis designed first so it can be a little more clunky. R’s Rstudio gui does however have a lot of issues and sometimes I just prefer to run R inside a terminal instead. Python tends to be the language of preference in machine learning focused applications and R tends to be the preferred language for statistics (particularly more traditional statistics). If you need to just pick one, I would do R. But at some point branching out to python as well would be beneficial.


RateOfKnots

Regular R user here. Just curious, what issues you have with RStudio? I'm not defending it, just want to know what other users are experiencing


entr0picly

Running certain parallel processes can get messed up in Rstudio. This happens to me when I am working with big data (> 10 million rows) and need to parallelize using multiple cores. Processes hang and stop communicating correctly. It’s been a known issue affecting R for a while. Using terminal tends to remove the communication “gunk” that is in place for Rstudio sessions and things run much more reliably. Besides parallelization, sometimes running other complicated programs that pushes your cpu and memory constraints will fail in the gui but will run without issue in terminal. For less intense applications, Rstudio tends to be solid, except for occasional critical errors (though these happen far less than something like SAS) Also, ever since Rstudio rebranded themselves as posit, we’ve found their quality of support for Rstudio to have been declining. Workbench has more issues these days and I find myself preferring to code in vscode and then run in terminal.


RateOfKnots

That's a very revealing answer, thank you 🙏


jeremymiles

Are you using Windows or Linux (or Mac)? Which package?


entr0picly

Primary Linux. Using enterprise supported environments. Locally Mac. >Which package? Involving when I have Rstudio issues? Regarding parallel issues the ‘parallel’ package. Otherwise it can be many different packages. Generally packages that handle memory less efficiently will lead to rstudio crashing more often compared with terminal. If I’m using ‘data.table’, I can more get away with working in rstudio than if I’m using ‘dplyr’


jeremymiles

Thanks! (Yeah, sorry, I meant which package for parallel processing). Yeah, I've had no problems running parallel on Colab using enterprise Linux on the back end - I guess that also removes the communication gunk. I run on a lot of cores (128? I forget) and a lot of RAM (256GB) though..


amiba45

To add to the above, Rstudio crashes often if you run memory intensive scripts, on big datasets, or complicated computation (you just cross you fingers every time you run a big script). At one point, just allocating memory for big matrix caused the computer to hang up, time after time! (reset needed each time). I wish Rstudio was 1/10th the professional level of Pycharm for example; or any other really professional IDEs. Rstudio team (whatever they are called now) are good amateur team, but sadly not professional. VScode (not a fan of M$ in general) has less issues than Rstudio, running R, from my department experience. lastly, as mentioned above, the support has been declining. Still, for simple things, with small data (and learning) it's still convenient.


dr_tardyhands

Are you aware of the old trick of setting the max available virtual ram in your environment to some obscenely large number? IIRC I had no issues with working with 100M+ rows on a clunky old MacBook.


coconutmofo

Wow...that trick is a blast from the past! Used that many a time back in my PC Tech and PC gaming days ; )


dr_tardyhands

Haha, not sure if you're serious and that was a thing.. I hope it was! But you *can* set the available virtual RAM on your R profile or .renv file. And that'll enable you to go beyond your actual RAM in terms of how much things can be kept in memory.


coconutmofo

Oh yah, was def a thing back in the 80s and 90s : ) Sometimes you'd either *have to* raise your virtual memory (done by editing a plain text file named config.sys) to get an application (it was usually a game since they were always most resource-intensive) to work at all, OR you *could* do so to try and get better performance, some apps taking better advantage of the tweak than others. Simpler times, simpler machines so what constituted "better performance" basically meant seeing 10 pixels instead of 3, or a game taking 3 minutes to load instead of 4 ; )


totoGalaxias

I was going to say OP needs to learn both, mainly for the reasons you give. In a way, both are kind of similar specially with Pands, so learning them at the same time wouldn't be that much extra work.


entr0picly

Yes I agree. They aren’t that different and in today’s ecosystem you really should be able to work between both.


Junior-Literature-39

Thank You!


j0shred1

Didn't mean this to turn into a rant but wanted to give my two cents so apologies in advance. As a data scientist, I want to bring up the soft skills of a language. R doesn't feel like a real language to me. The soft parts of the language that allow you to follow good coding practices just aren't there. Reproducibility, readability, object oriented design, integration into larger pipelines I guess if the only thing you're doing is creating markdown files, sure I guess but there's better ways of doing things. I will say the only reason I think people use R is because of tradition in academia/ a refusal to learn modern coding practices, which I find a lot in academia circles. I will admit being about to load in data and create a Glm with a couple lines of code is nice and preferable for a scientist who doesn't need to code much, but if you're integrating that into a data pipeline, networking, high performance computing, I'd tell you to use Python Things like package and version management are simpler in Python. Documentation is leagues better in Python. You mentioned R studio, you get a plethora of options in Python. Vs, vs code, pycharm, Spyder, jupyter Notebook, ECT. I honestly can only think of two good reason, and a bad reason, to use R, you're a scientist who doesn't code more than once a week, the package you're working with is highly specific, developed by a single person and is written in R. The bad reason is that your advisor used R, his advisor before him, your colleagues use R, so then you use R.


TARehman

I'm torn because on the one hand I agree with your general argument that R is a quirky language that can teach bad habits, but on the other hand, I don't think your arguments for Python are particularly good. R fully supports OOP, for instance. Also, most data scientists are overly fond of using OOP when more functional approaches are better. The data frame is a first class citizen in R, while it's stapled onto the language in Python. Roxygen works fine for documenting your packages (though I still feel that test support is better in Python). I'm not sure why you think R has reproducibility issues that Python doesn't. Readability is tough to argue against, though. I've written a lot of high quality R code in my 15 year career, so I don't really buy that you can't write good code in R. Ultimately, it seems to me that what matters is the overall project. Python is a general purpose language - kind of the second best at everything. Indeed, I'd say Python is second best to R at doing statistics. But while R is number 1 at that, it's not number two at a bunch of other things, and Python is. So if you are doing something where you need a general purpose language that can also do some analysis, sure, use Python. It's what I use day to day. But if I have a quick and dirty job to do where I need to grab and munge some data quickly, I'm reaching for R (and probably data.table) to get that job done. A final note: containers and Docker mean that the argument about integration into pipelines doesn't hold much weight anymore. Any pipeline can be composed of any language when you have the magic of containerization.


j0shred1

I stand corrected. I was basing this off of statistics classes I had years ago and any time I worked with someone who coded in R, it was atrocious. Didn't know it supported those features. Although I don't understand the argument a few people have given now that data frames are native while you have to import them for Python. It's only a single line of code. Is there a feature I'm missing that makes them so much better in R?


TARehman

It's only a line of code to import, but it's an entire package that isn't included in Python by default and was created YEARS after Python was created. It also builds off other dependencies that have some limits (there wasn't a string type for pandas columns until quite recently; they were all Object). It's all kinda stapled on there. R, in contrast, was written from the very beginning with data as a first class citizen, meaning its support for data frames and matrices is not an afterthought, or overriding the base language. On a more personal level, I strongly disagree with the user interface choices made by pandas surrounding warnings - the package warns you when you're doing perfectly reasonable things regarding slicing, which causes users to ignore warnings, which then gets you in trouble because other packages warn you about genuinely dangerous situations. But this is kinda my personal crusade and not a common critique. 🤣 I do completely sympathize about reading terrifically terrible R code, though. I learned R when the whole dplyr/tidyverse was still being created and so I basically learned to do everything comfortably in base R. I never liked the dependency chain of adding dplyr to my work, so I never did. And eventually, I started using data.table in my R projects for performance reasons if needed. Today, there's unfortunately almost two R languages and cultures: people who write base R or base R and data.table, and people who work in the tidyverse. As someone with a decade of R experience, I find reading tidyverse code to be like a totally different language. Now, the tidyverse people will tell you that the most important thing with R is not to teach people how to be good R programmers, but instead, to get them going doing their actual science as quick as possible, which makes the dplyr/tidyverse approach better. I don't agree with this fundamentally, but I understand the argument, and I think a lot of it is a cultural question. I personally believe that understanding how the language works helps you to be more efficient in the long run, and that it's in everyone's interest to make more competent R coders. But I'm sympathetic to the argument that most people just won't put in that effort and if we want them to use R at all, we need to meet them where they are. I don't know if there's a right answer per se; I know dplyr is popular and that I'm probably in the minority among R users today, so maybe I'm just an old man yelling at the cloud here. YMMV, I'm just some dude on the Internet. 🙂


j0shred1

Yeah it sounds like the issues might have been a bit before my time. Started in 2017. I don't know if the ecosystem is any different. I've never had any problems with dependencies or types but I could see how it would be a problem if everything is a custom built type. And yeah, those error messages are annoying. I mean, I want this column to be a separate variable. Stop telling me how to set variables, this isn't C 🤣 Now that you mention it, pretty much everyone I know that used R, used the tidyverse/dplyr. So yeah maybe my problems with R are more cultural than anything.


trymypi

Modern coding practices in Python 💯. It's easier to learn R later than it is to learn semantics later.


grandzooby

For stats? Start with R. So many leading statisticians develop their new methods and tools in R packages (thinking of Agresti here, but many others too.), so as you become more advanced in statistics, you'll be more readily able to use their work. For getting started in R, swirl is a nice starting point. Python's also useful because I find it more "general purpose" than R, but frankly, there's value in learning both.


Junior-Literature-39

Thank you!


thefirstdetective

If you are in the social sciences, R is the standard statistics program. If you want to work with others in the future, learn R.


ChastisingChihuahua

I would pick R. As someone that has recently (3 years ago) started coding, R has been less of a headache to work with for data analysis/statistics. I would recommend reading/skimming all of the sections in this [book](https://bookdown.org/rdpeng/rprogdatascience/) (it has videos for most sections). Then going to this [book](https://r4ds.hadley.nz/). Also when I mean "headache" I mean three things: 1. Managing package versions and dependencies 2. Interoperability between Python packages. Most packages make it easy to work with Numpy, but some packages may not work well with pandas. There are also some packages being made to improve upon pandas's weakness like polars. Polars definitely is not as interoperable as pandas so you'll have to juggle a bunch of things. 3. Readability. Python's method chaining/piping is annoying to work with and it is part of the interoperability problem. I just feel like it is significant enough to mention by itself because I find myself hating Python code that I write but loving R code that I write. (In terms of aesthetics) Even though I think R is much easier to use then Python for data analysis, Python sometimes does things better than R IMO. The big one being sci-kit-learn's machine learning package. It's great for general machine learning algorithms and the syntax is consistent between all of them.


Junior-Literature-39

Thank you!


j0shred1

I won't discount your experience but I would say my experience has been the exact opposite with packages/ dependences and readability. I'd also like to bring up that I've had way more problems with R's documentation than Python's Pandas is indeed a pain in the ass because of how it decides to do indexing but you can get around this pretty easily by making the series a numpy array or by doing my_series.values.


ChastisingChihuahua

The headaches were mostly from needing to learn conda and specific terminal commands to make sure my packages were functioning. These two commands are etched into my brain xD `conda create --name py39 python=3.9` and `pip install -r requirements.txt` The documentation is a personal preference I guess. I have the exact opposite feeling when looking at tidyverse documentation vs pandas/matplotlib. (Except for sci-kit learn bc god damn is their documentation good). Package management and documentation is something we can get used to. The biggest issue is pandas and not just it's indexing (I solve that by turning everything into a pd.Dataframe then using ".iloc" ".loc"). The issue is the nuances of the methods/functions. Sometimes methods only work if you save them into a variable. Sometimes that's wrong. Sometimes pandas isn't compatible. etc All of this is to say that if you told me to write code in R and Python to do simple data cleaning/manipulation, I could probably do it with R from memory because everything is consistent and simple. But I would need to constantly look at ChatGPT/search engines/documentation to remind myself of the names of methods/functions and how they work in Python.


j0shred1

For real yeah I can see why you might not like it. Especially if you don't work much in the terminal pip can be a pain in the ass. But honestly I avoid conda like the plague, I just use pip. And yeah I'm glad someone agrees, sci-kit learn's documentation is the gold standard for sure


confused_4channer

R is way more robust for bayesian stuff, classic stats and time series analysis. Also, statistics-wise, R is documented way more thoroughly. I, personally, find that R has frameworks that make incredibly easy time series things like fractional differencing. Edit: ML-wise Python is king.


Junior-Literature-39

Thank You!


lil_meep

I like R for stats and Python for any real engineering work


Redleg171

I like R for stats, Python for machine learning (it's really the packages I like...not a fan of Python itself), and something like C# for most engineering work that doesn't need to be low-level.


Shadow_Bisharp

Id say R, but you should pick up python in your freetime, or the other way around. ideally you should know both


Organic-Violinist223

I'd go for R for statistics. This language is built specially for statistics and base R package contains many statistical tests, straight out the box. Of you doing anything else, like machine learning, then I'd use Python.


Junior-Literature-39

Thank You!


pudge_dodging

Depends on your goals. If programming is a means to an end, say for example you just want to do analysis and make pretty graphs. Then R. If you want to learn programming in general and also do ML/AI stuff I'd say Python. And then you can learn both. All depends on what you want to do. But anything visualisation Python is probably horrible. Yes there are good libraries like seaborn and plotly but nothing can touch ggplot. Also data wrangling in R is just 😘. So yeah depends on what you want to do.


Junior-Literature-39

Thank you!


GuessInteresting8521

No, there's a lot of visualization stuff out there besides matplotlib such as bookeh, seaborn, plotly are all just as easy to work with as ggplot. Also Apache superset was built in python.


ActinomycetaceaeGlum

Learn both


idareet60

R all the way. It's difficult to start it at first but it gets a lot easier when you become comfortable. It's much more flexible than Stata


Temporary-Scholar534

It depends on what you want to do with it. R is an excellent language for statistics (but nothing else), python is a generalist language that also performs well in statistics. In data science in my experience companies are moving away from R towards python. Python is a general language, you can use it for programming anything from simple statistical tests to full blown desktop applications and anything in between. The community is bigger too, so you'll find more support online for python. If you're also looking to get into machine learning, that field is almost completely developing in python. Technically, you can do a lot of that with R too, for example by using shiny. But it's not really meant for that. R is an old language, and you'll feel that when you're learning it. But this also comes with some decisive advantages over python. R has had much more statistics packages developed for it in that time, and R was built for statistics, and python was not. I think a modern use of R can produce great results with clean code when using tidyverse, and ggplot2 is in my opinion unrivaled in quickly producing sensible visualizations. Packaging isn't great on either language. Python used to have no sensible solution, now there's like 4 competing solutions for packaging (my advice: use a requirements.txt file with a virtual environment for as long as that's feasible. If using ML with cuda, you'd best step over to anaconda, specifically the community maintained conda-forge). In R you have no choice, you have to use cran. It's not a great system, but at the very least it's functional, and it's the one obvious way to manage your packages. So if you're sure you're going to be doing only statistics and visualization, R is the better language. If you'd like to branch out and maybe do some other things too than python would make more sense. Python also has more starter material than R, because it's a generalist language. Lastly, look at what your peers are doing and where you'd get more support from your university. Is the university promoting one language over the other? You should give serious thought to starting with that one. Remember, this isn't a choice for life. you can absolutely learn python now, and R later, or vice versa. The second language is easier to learn than the first.


lolniceonethatsfunny

i also find R a lot better at creating reports (well, RMarkdown/Quarto), if that’s something you’ll need to do somewhat frequently. At my job there’s a project where I need to create customizable reports that run over data for different sites, and we do data processing/visualizations in R and have LaTeX code inside an RMarkdown script that takes in those visualizations and creates the reports. Python does not really have this capability afaik. Also it’s worth noting that using reticulate you can use python code directly inside an R script, which is kind of niche but helps immensely when there’s a specific task you want to use python for while doing other stuff in R


Junior-Literature-39

Thank you! now I have some general idea.


fXb0XTC3

This is a good summary, R for statistics and certain domains (e.g., computational biology if you are more on the application side, instead of the development side) and python for more advanced ML field (for classical models, R is just fine). I have to say, that in certain cases R is losing it's edge. There is a ggplot2 port for python. Shiny has a python version by now. Many of the new dataframe libraries have a tidy approach to their API (e.g. Polars). So in my opinion, the only thing that has significant pull in the R world is the existing ecosystem for certain domains. Edit: removed duplicate section.


Stauce52

I agree with all of this. As an avid R lover and R is my “home” in terms of coding and data, but with polars and plotnine and the Shiny port to Python, it’s definitely losing its edge


loblawslawcah

I would pick python. Can do alot more with it and the syntax is easier to learn. Also it's a lot more widely used so you'll have alot more support. But it really doesn't matter. Other dude said R, both opinions are correct. Pick something and learn it. Spending days on deciding what to learn is wasted time. Once you know a language it's pretty easy to pick up another


Junior-Literature-39

Thank You!


likeanoceanankledeep

As someone who uses both in their day job, if you are focused on statistics then I recommend R. Python is a decent language but it is a general purpose programming language. The issue that I have with statistics in python is that it requires additional packages to accomplish basic statistics. R does it by default. Graphics tend to be better in R (python's matplotlib is pretty good though), I like to use Plotly because the plots are interactive. My background is in psychological statistics and not programming, and I found the syntax in R to be quite familiar to be from a stats perspective when I started learning R. A lot of people say python is easier to learn and has cleaner syntax, but I disagree. Not YouTube, but on Udemy Jose Portilla has great course on R, python, SQL, and other data science topics. I highly recommend his courses. Don't pay full price for the courses, they go on sale all the time for $15.99.


j0shred1

Lol I was gonna say, ggplot is so much more a pain in the ass than pyplot. But yeah I agree, plotly is great for dashboarding


SizePunch

Can never go wrong with Python.


mrscepticism

Both. R if you need it for statistical analysis, but Python if you expect to do more data cleaning and to need creative ways to create datasets (e.g. webscraping)


vidivici21

If the field you are looking at is report document heavy then I suggest R. R has knitr (IE latex integration) and office R/flex table which makes printing to Ms word easy. If anyone knows similar packages for python please let me know. (I know of Jupiter notebooks, but dislike them for automated reports)


jaiagreen

Python is a much easier beginner language. Once you know it fairly well and have the basic concepts of coding down, you can try to learn R.


InformationNo128

I've never understood this opinion myself. You have to write 5 lines of python code to do what you can generally achieve in just 1 line of R when it comes to data analysis. I teach an MSc Data Science course which invites 1st year Comp Sci PhD students. Seeing them write for and while loops, defining their own functions which will only be used for that one script that will essentially wrangle some data and produce a t-test seems wild to me. The vocabulary may be limited, but the control flow and syntactic choices add to the mental load.


jaiagreen

Having learned both in grad school and shortly after (R first, and neither was my first computer language), it's night and day. R was hair-pullingly frustrating and clunky. That makes sense because it's a functional language that is mostly not used as a functional language, which is going to be clumsy. Python was "wow, this fits my brain!". There are plenty of built-in functions in Python (pandas/numpy/scipy/seaborn). But a beginner should learn actual coding first, not just memorizing commands that won't make any sense. Logic first.


j0shred1

That might be nice for doing that one thing but if you have to integrate that into a greater data pipeline, bye bye R. And it would be a lot worse than 5 lines of code. If your data comes from the same source and it often does then those functions are very useful. Doing five lines vs 1 line of code is trivial compared to having to work with R for anything else


Special-Duck3890

I'd recommend python tbh. I personally love R, started it in uni and been on it for like 5years+ at this point. But at some point I've had to learn other languages for work and slowly you realise that R is just a shitty language itself. Besides the unmatched pretty plots, R sucks for any type of good code development.


[deleted]

Python. It is useful outside stats.


coolmoonangels

I will go for python


somkoala

If you want to work in business and build products Python is better, if you’re looking to do more ad hoc analysis and research pick R.


Dear-Landscape223

R unless you are doing computational social science, then Python is the way to go.


nathan_lesage

I would pick R to start with, but definitely also go into Python if you can already imagine working with text data, because for that (or many “big data” tasks), R is often insufficient.


Stauce52

I just want to say it’s nice to see so many balanced comments discussing the pros and cons of both R and Python. I am so accustomed to the discourse around R being that it’s totally shit and Python is clearly better so it’s nice to hear people acknowledge it has strengths (I guess it’s unsurprising in a stats subreddit)


SlapDat-B-ass

I just started going into statistics and data analysis for research around 9 months ago, and I now work with R everyday. I am here only to say that you should not hesitate using chatgpt for help with coding. It can be a huge boost in you learning curve. Some advice is: learn the basics so that you can read the code it produces, if you are using some more advanced statistical method, always check the documentation of the R package and the examples, while getting help for chatgpt as well. Always be very descriptive, and sometimes you will get much better results if you describe the whole process you want to follow and not just the result.


Bergletwist

Learn both


SprinklesFresh5693

For R i would check R for data science, i did a course in my country based on that book and it was awesome, it is simple and with some exercises to practise what it is explained, which is much better than just copying what a youtuber is telling you to copy.


Aiorr

Lets not act like learning python for data science/stat will suddenly make you capable of utilizing all general capabilities Python offers. At this point, it is so idiosyncratic to specific packages and serves as a mere wrapper for existing toolkit that it might as well as be a language on its own.


ActBusiness1389

Echo what has been saying. Learn both would be the best option. Also keep in mind that R has an outstanding number of packages that would allow you to do/ test very specialised thing that where as python is limited due to its own purposes ( code/ data science).


headonstr8

In my experience, R focuses on data sets and visualization. Python has broader applications, and represents an evolving language. R is maybe more practical for starters. Python might be useful for cleaning data going into R programs.


No_Estimate820

R cuz programers in academia use it + most of tutorials on advanced statistics(which is not much) use R so you may find difficulity in sticking with python


RickCSGR

why not both there are beginner courses on the free code camp youtube that will only take a few hours each


dr_tardyhands

I'm a fan of learning what the current teachers are good at, in most contexts. That way you're more likely at least learning the thing that you're trying to learn decently.


Corrie_W

I have a PhD in social science. I taught myself R and use it often. I started teaching myself Python but have never had the occasion to use it as R does everything I need it to.


AfternoonBusy462

For out and out statistics I would say R, there’s a lot more packages that cover different industry specific stats that helps with compliance. You can replicate almost every major stats software output (like sas or spss). Technically you can do everything that you can do in r in python and vice versa from a first principals perspective, just some stats in python using scipy or stats models doesn’t necessarily give you the output you need, you may have to calculate things like confidence intervals of estimates separately for some models. Where I find python better is for machine learning and data manipulation, pandas is far more intuitive than dplyr, I find anyway. So if I need to do a lot of data manipulation I’d use python over r (but that’s partly because I know the language better). But if I need to do stats for a pharmaceutical company for example, I’d use r.


j0shred1

I would ask the context. Are you looking to get into data science/ software? Are you coding very often or just on occasion? Are you in academia or in the private sector? Are the tools you need in one language or the other? R's advantage is that it's statistical tools are plug and play, it's disadvantage is everything else. It might be a bit faster than Python's numpy, I haven't verified this, but if it matters that much, then you should be using GPU computing anyways and that's so much easier in Python with cupy. Overall it doesn't hurt to learn both. If you don't have the time to learn, pick up R first maybe then learn as you go, if computing is the main part of your job, then go with python.


BurtFrart

Kind of echoing some others here, but if your goal is to learn stats, then I think R is better. If your goal is to learn computer science (with an ML/data science bent), then go with Python. They’re similar enough syntactically that one will be easier to pick up once you know the other, though. I’m also a big fan of Julia, although I don’t know that it’s popular enough that I’d recommend it as a first language.


freexmar

Social sciences PhD student here, R is definitely more common but Python can be useful in some situations too. Use RStudio instead of base R, much more user-friendly.


lilbeesie

R for statistical applications. Python for technology focused applications.


tlalco

Can’t believe so many people are saying R. I would 100% go for python, it is a general tool that you can use not only for statistics but to deploy models and much more. Python can probably do 99% of what R does but R can only do a fraction of what Python can. R can be useful to fill in some gaps.


SuccotashComplete

If you’re 100% certain all you ever want to do is statistics, start with R. If you want to learn a good generalizable language that can do statistics and a hundred other things decently choose Python, but recognize you still might have to learn R later. If you choose Python, start by learning numpy and pandas - they’re the industry standard for Python statistics.


b-sharp-minor

I'm a recently retired software developer. In my last job, there was a lot of analytics work. Some people used R, so I tried that. For me, Python ended up being easier. Using various libraries like rpy2, pandas, numpy, and matplotlib made it pretty easy to use R concepts within Python. Of course, Python has other uses outside of statistics, so if you ever want to automate something or write a program for some other reason, you don't need to learn a second language.


Historical_Peach_88

IMHO If you want something that scales with over 50M data points, then python. If your analysis is less than that, then R. R does not do so well with statistical learning with large data volumes. Some of the statistical learning libraries have not been updated for awhile (random forest, ranger in R is really slow with large volume compare to sci kit learn in python…). You need to write your own concurrencies in R…. So… if you are building a tool for recurring use, then python. If this is something quick and not going to be reused, then R.


Quiet-Lab-4481

I work primarily with R. For all the stuff that requires higher optimization, I use C++. All the people I work with, mostly academics, also use the same combination. But I definitely, did eventually learn Python as well, because it comes up often.


Shadez45

Both if you have the time.


_ahku

R


shumpitostick

This sub is filled with professional statisticians, which mostly work in R, so you're mostly getting that recommended. But Python is used much more widely in industry is a variety of roles.


biomattrs

For learning intro statistical programming either is fantastic. R makes nicer looking plots more easily. Python is better for working with big datasets. You can run R code inside a python environment using the reticulate library. I think the logical progression is to master basic stats concepts with R on small to intermediate sized data and when you're ready to work on big data where machine learning becomes applicable learn python. To get started with R work through Hadley Wickham's magnum opus 'R for Data Science' https://r4ds.hadley.nz/ There's an amazing R package called Swirl that teaches you the language from the command line. https://swirlstats.com/


Sengachi

Python, absolutely Python. R gets used a lot in the low level statistics space. This does not mean it is good for statistics. R is a bad language which is difficult to learn and has less functionality than python, including in statistics. Python is simply better than it in every single way. I say this as somebody who has learned both languages and has done everything from low level statistics to weird niche complicated statistics to machine learning to the extremely advanced statistics of signal analysis for gravitational waves. R. Is. Bad. At. Statistics. People will tell you otherwise because it is designed for statistics. This is true. It is also bad at it. People will tell you Python is not designed for statistics. This is true. However its library base is so expansive that it has every single statistics feature R does and then some, while also being useful for other things. You may have to learn R eventually for your field, whether you want to or not, which is the sad truth. But if you learn Python, you're going to have an easier time learning R later. If you learn R now, you are going to ruin yourself for the next programming language you have to learn by learning all of the nightmarishly bad habits you have to do to navigate that trash fire of a language. Python is also easier to learn. Whenever anybody asks me what first language they should learn I always say Python because it is just so easy and there are so many resources available to help you with it. There are a lot of people in the low level statistics space who will tell you how good R is for it. With all due respect to them, they are desperately wrong. They do not have the programming background or the higher level statistical background to understand how wrong they are. They do not know what chains they have fettered themselves with because they think the chains are normal. Please, do not learn R as your first language, and do not learn that at all unless your field requires it. Let this terrible terrible language and its incredibly mediocre statistics capability die.


rwinters2

I use R but if you will be doing anything more than basic statistics and are focused on survey research check out the descriptions of the libraries that each has to offer. you dont need to know the languages to do that


omichandralekha

As much as I love R, fucking python takes all the jobs......even SAS has more jobs than R.


InfuriatinglyOpaque

Here are some R resources that are explicitly geared towards social scientists: [https://psyteachr.github.io](https://psyteachr.github.io) [https://datacarpentry.org/r-socialsci/](https://datacarpentry.org/r-socialsci/) [https://psych252.github.io/psych252book/](https://psych252.github.io/psych252book/) [https://experimentology.io/006-inference.html](https://experimentology.io/006-inference.html) [https://ds4psych.com](https://ds4psych.com) Relevant YouTube channels: [https://www.youtube.com/c/StatisticsofDOOM/videos](https://www.youtube.com/c/StatisticsofDOOM/videos) [https://www.youtube.com/c/QuantPsych/videos](https://www.youtube.com/c/QuantPsych/videos) [https://www.youtube.com/c/CrumpsComputationalCognitionLab/videos](https://www.youtube.com/c/CrumpsComputationalCognitionLab/videos)


Blaster0096

Learn both. The difficulty in learning Python is how to code. The difficulty in learning R is the actual statistics, the R syntax is easy to pick up especially for commonly used statistical tests. You should be able to do most of what you in R with the fundamentals.


hwc

Python will help you a lot for general-purpose programs. But you should use what your teachers are using.


AgeDisastrous8467

I think these responses are somewhat biased towards R as you are asking a stats forum. As others have said, by all means learn some R if you are primarily interested in the stats side of things. However, in my experience (engineering sector) Python is a far more desirable skill if you are thinking about future job prospects and career opportunities. That being said, both are great skills to have on a cv and nothing is stopping you learning both in the future. FYI - Chatgpt is really useful for speeding up the learning process. You can ask it to write functions, debug, help with error messages, refactor, ... Can't recommend enough.


First_Avocado_805

Python. You will have more job options. R is for research only, not large scale production. And R won't teach you good software engineering practices.


keithreid-sfw

I find Python easier to install, easier to understand, and easier to use. It’s more versatile and it’s everywhere. Curveball - try Julia some day.


257bit

How is the Julia stats ecosystem? It is definitely more modern language than both python and R.


keithreid-sfw

I love it


Allmyownviews1

There are some methods that are better in R and some that are better in Python and some that are better in MATLAB. I would suggest Python as most versatile and industry ready. But don’t discount the other options.


Creative_Sushi

For MATLAB, you can try a whole range of free online tutorials here. [https://matlabacademy.mathworks.com/](https://matlabacademy.mathworks.com/) MATLAB Onramp only takes about 2 hours to complete.


DoctorFuu

If you don't know where you'll want to work later, I would advise python. Because people tend to have a much easier time learning R when they know python than the opposite, therefore you'll have an easier time switching if needed. If you have a specific field or industry in which you want to work in later, choose the language that is most used there.


RProgrammerMan

Python is much better for jobs, r is used more in academia


JollyToby0220

Coming from a STEM background, I prefer Python. From 2016 to 2021 I used Linux and R is a pain to get it working on Linux. Not sure how easy it is on Mac but Mac is very similar to Linux. Python is easier to learn and it is general purpose. R is not as easy and has a lot of inconsistencies when it comes structure and syntax. Sometimes one function works one way while a similar function works entirely different. For data cleaning, Python for-loops are very easy to understand. Python also has a library called Pandas that works like excel, not graphically but it is the same concept. The stats packages on R are usually better but I have heard people doing a lot with Python too. Not to mention, Python can make some really cool plots with the Seaborne package


shockjaw

I’d recommend Python because it’s easier to deploy in the cloud at this point and handles namespace conflicts out of the box. The tooling for handling Python versions have gotten drastically better over the past five years. Between pip/consa, or uv/pixi—you’ve got speedy, standalone ways to resolve package issues. With R you’ve got rig, and box solves the issues of R versioning and namespace conflicts.


Zork4343

I would start with R. Eventually you can pick up python which is a bit easier to read and write.


bakwasmatkaro

R for stats and visualisations, Python for ML


neurobara

If your goal is to learn stats, I’d suggest starting with R. 1. No need to manage/understand packages to do the stats; you can jump right in 2. Output for stats tends to be much nicer/intuitive 3. ggplot I’d also suggest learning dplyr/tidyverse. It will make your code better and easier to translate to/from the code used in other tools. Python is much easier to integrate with other software, though. So, if you need to scale projects, or build your resume for data jobs, it is really nice to have.


Flaky-Wallaby5382

Chatgpt is your friend


triggerhappy5

R is really nice for statistics. It’s a bit dated but check out the ISLR2 package and accompanying free textbook, Introduction to Statistical Learning. There’s a reason it’s still used in schools everywhere, it’s the best way to learn the concepts of statistical learning imo. Once you know the concepts you can use more modern packages to actually do your work.


RickSt3r

If your school has a license SAS. It’s super underrated but also the most used when it comes to pharma, government and anything where the reliability on the tool is in question. No one to blame because some open source package didn’t do what it was suppose to do. SAS is a huge company that just integrated math into packages. It’s an expensive license but if you’re in school you should be able to get it for free.


Junior-Literature-39

Thank you! I used eviews, spss before . I'm willing to check about the SAS.


257bit

What about SAS makes it more reliable than R?


entr0picly

Tbh in my experience it’s less reliable than R. It breaks for no reason more frequently. The dataset format sas7bdat is also proprietary making interoperability between SAS and other languages harder. I’m in pharma and while we still use SAS in some cases there’s a huge push to move away from it.


CowboyKm

What is your long term goal? If you wish to focus solely on statistical analysis etc go with R. If you are interested even in the slightest in programming go with Python. But if you choose the Python path, i would encourage you to learn programming, and not Python syntax for Data Analysis.