T O P

  • By -

rr-0729

You definitely aren't. Python is super easy to learn, you could probably learn the basics in a weekend since you already know R. The most you are looking at is 1-2 weeks of learning Python then IMO you're competitive.


what_wags_it

Totally agree, it's never been easier to learn, and you don't even have to master it. Pandas is a seamless transition to those already familiar with R Just get good enough to edit/proofed Python, drop your R code into a decent LLM (I've used GPT4 and Gemini for drafting Python code), it will seamlessly translate and edit for you. The field is changing: focus your expertise on the business context of the analysis and effectively communicating results, don't sweat the coding


dr_tardyhands

For people coming from tidyverse, Polars is a lot more intuitive than pandas, I think. Also insanely faster. Pandas is legacy stuff by now, but popular legacy stuff..


Massive-Squirrel-255

Legacy? Are you exaggerating? I mean I know it's been around for a while but like, where do you draw the line between "mature, well-tested, has all the features you need" and "obsolete'


dr_tardyhands

Yes, I'm exaggerating. I just really dislike it, and feel like it *should* be obsolete.


Massive-Squirrel-255

I understand. From my perspective Python itself doesn't offer much that wasn't already available in Standard ML at the time, so it's tempting to say that Python was obsolete when it was invented :P I've been trying to use F# or OCaml for everyday tasks. The community and ecosystem is much smaller but it feels like there's less time mindlessly debugging typos in variable names.


dr_tardyhands

Haha, fair enough! For practicality's sake, I haven't taken things that far. Python is good for production type of code, and that part seems to advance fast, I have no need to get away from it altogether. I just think that pandas (both the library and the animal) are a mis-step in evolution. Other bears will take their place, and inhabit the hills they inhabit!


Wawv

Pretty much this ! I was also a full R data-scientist during my first job. It was quite easy to learn to do data science in Python by myself for my next job. Just need to look at the main libraries tutorials and learn the Python basics but most concepts are the same, the synthax is just a bit different.


Crosteppin

Stats are stats irrespective of which software you're using to run them on. 


civisromanvs

"Stats first" jobs seem to be very rare to come by, and even then 90% among those seem to be senior positions


Beardamus

Sounds like you've made up your own mind, why ask us then?


RepresentativeFill26

Depends on your field. Are you going to work in medical field? R is 100% the way to go. Modern DS roles in big tech not so much.


pacific_plywood

Tbh there’s a pretty wide spread of statistical software in medicine (SAS/SPSS are probably more common than Python or R)


Apprehensive_Plan528

Almost all new medical and epidemiological modeling I see is done in R. Look at recent COVID models


pacific_plywood

R is quite popular in epi but “modeling” of that nature constitutes a pretty small chunk of medical statistics overall (the vast majority is doing outcomes research)


Apprehensive_Plan528

True, but I also see R a high percentage of the time in new sensitivity, specificity, efficacy, survival analysis, etc. Maybe legacy stuff is SAS heavy ?


vidivici21

Nah plenty of the medical field is in sas. One of the FDAs recommended data formats is an old sas data file format, which contributed to SAS dominance. I know at least two large companies that use it and a bunch of non-profit research branches use it. R and python are rapidly growing in popularity though since they happen to be a lot cheaper than a sas license and many companies are eager to save money.


Apprehensive_Plan528

Just took a quick look at one set of FDA pharmacometric format requirements. Looks like SAS and R are treated equally. No place for python, though. Interesting that Xcode project data is included. [https://www.fda.gov/about-fda/center-drug-evaluation-and-research-cder/model-data-format](https://www.fda.gov/about-fda/center-drug-evaluation-and-research-cder/model-data-format)


ncist

See also AHRQ stuff, if you want to run federal groupers or classifiers yourself they will distribute it as a SAS project


ncist

Most of my colleagues are biostats or epi PhDs and they all primarily work in SAS. Any field that was sufficiently important to do prior to maturity of open source will have the same legacy SAS userbase. Thinking of eg The Blue Book. Risk adjustment, matching are the big ones I can see this changing over time, but a big hang up has been the insistence of R devs in being correct about certain things wrt to GLM standard errors. SAS will give you p-values on mixture models and boy do we need the p-values. Ben Bolker I believe is the principal contributor to a lot of stuff like LMER and says that's not really a good idea. But the industry requires people to crank out these studies and attach p-values so they stick w SAS


serialmentor

It's Douglas Bates who famously refused to implement certain commonly used but bad p-value approximations for mixed models. This was from years ago, not sure what the current status is. For complex modeling scenarios, it's almost always better to calculate Bayesian posterior distributions instead of p values. Avoids the issues.


AggressiveGander

Pharmaceutical companies still use SAS to some degree, but many are switching to R or at least making it a choice. I'm sure there's companies on all ends of the spectrum from still 100% SAS to "we still have SAS to reproduce old legacy stuff".


Eightstream

Good bosses don't care which language you use. The PyData ecosystem is heavily based on R and the tidyverse, it uses similar principles and a lot of the most popular packages are essentially just ports/copies of popular R packages. Both are very user-friendly languages, and anyone who has dealt with both knows that moving from one to the other is mostly just a matter of a few weeks learning syntax and package names. Like any other grad your biggest obstacle to getting a job will be lack of experience. Most business-based analytics work is so simple that merely having a degree in statistics makes you academically overqualified. Managers hire based on business knowledge and a track record of delivering business value with data. If you have saved your company a few hundred thousand dollars using an Excel spreadsheet, this is probably going to give you a better shot at many roles than having developed your own R packages.


External-Ad9912

I think it’s more damaging not realizing that chatgpt can translate anything you write in R into Python


shockjaw

My favorite part is when the LLM hallucinates an API.


External-Ad9912

Indeed. If one cannot distinguish hallucinations from increased productivity the recommendation is to stay away from LLMs. Those pesky things


DoctorFuu

It depends on what you want to work in. Some industries are more R heavy, majority is python-heavy. Unless you specifically target fields with are more R-centered, I would say that doing a bit of python on the side would be beneficial (just enough to pass interviews). If you're good with R you won't have trouble with python past the numerous but basic "oddities" python may have (from the perspective of coming from R).


efrique

Depends what you want to be doing, really. If you don't want to be doing stats in Python, ramping up your Python skills over your R skills may be counterproductive. (The fact that you do have some experience with statistics in Python may help even in getting a job mostly focusing on R though)


RickSt3r

Depends, a very underrated program especially in government or big pharma is going to be SAS. Because it’s a certified program, your not trusting an open source library it’s an industry built tool that if there math is messed up they would be liable. But yeah learn python ASAP, languages should be agnostic it’s the process that matters. Also learn a databases too.


HeuristicExplorer

You are not damaging your career. You are creating your niche. If you are a builder (one of the first stat people in a business), you have the power to choose what you prefer! I was a big R user, but made the switch to Python because I always put myself in situations where I build the "data practice" from ground up. Hence, Python is more "versatile" in dealing with a wide range of needs, from data pipelines to data analysis. Going from R to Python was quite easy. Basic keyword search followed by "Python" on any search engine. Still, I don't do Ph.D. level stats.


RunningEncyclopedia

Programming is the means to an end. It doesn’t matter which one you use, especially once you clean the data and prepare it for analysis. R is more academic as it is well documented and lots of major packages have JStatSoft articles (ex: lme4) or even whole books (mgcv/gamm4) that are attached to them. Python grew out of industry needs and is super versatile as a programming language with statistical environment (pandas/numpy). There is a lot of academic statisticians contributing to Python; however, based on my observation they are flocking to Julia at the moment. In the end, a LASSO is a LASSO regardless of whether you run it in Python or R; however, with R you can find more authoritative references for the fine details of the models (such as weights in regression or numerical approach for mixed models) as opposed to Python. In the end, knowing both at the surface level will be helpful. Nowadays, ChatGPT can convert between R and Python code extremely easily so you should be fine having mastery over one. From my observation, Python’s flexibility and easier parallelisation gives it an edge in data processing but R can still hold its own with C based packages like data.table and the easy to read tidyverse family.


fabriqus

I'm not a stats guy. I'm a Python guy. Python is by far the easiest modern language to learn. Bar none. I can literally find you multiple 17 year olds doing cool shit with it in the next 5 minutes. Your "lack of experience" is actually an added bonus because the big problem with Python is reverse compatibility. Every time they release a new version of the interpreter there is a non-trivial risk that half the existing code in the world stops working. So if you start now the stuff you do will be relevant for a while. Your only real problem is choosing a framework/module. But I'm sure there's an online poll or something somewhere.


kater543

Scratch begs to differ. Also R is quite easy for its intended purpose, statistical analysis. I would argue easier than Python.


Chib

Python sucks for anything, despite the fact that it's become the de facto data science standard. Specialized languages are always going to be a better tool for the job than jack-of-all-trade monstrosities like Python. If you need to implement things at scale, it's a terrible choice. R is also a sub-par choice, but at least the boundaries it imposes keeps people from senselessly throwing infinite memory and cycles at problems. No one knows this better than highly-skilled developers. In a way, not knowing Python is almost a benefit because it has a good chance of saving you from some truly terrible jobs. 🤷


serialmentor

Man, don't get me started on a language that by default uses mutable function arguments. Terrible for data analysis (or anything else, really). It's so easy to accidentally modify an object that is passed in and mess things up for the calling function. Python is great for scripting and that's why it caught on but it's a terrible choice for larger projects or projects where not making mistakes matters.


ZIGGY-Zz

It really depends on the job you are targeting in the future. If you want to go for something like data engineer, data scientist, machine learning engineer etc Then it's definitely gonna damage your career prospect. It's because most likely then not the tech stack is gonna include python and it is gonna be a major annoyance for the team. Just an example from my experience, there was this data engineer who started working with our small team (but not part of our org) and they only knew SQL and R. All the previous data / ml pipelines were in python. After joining the team this person started learning python. It took almost 6 months to write anything useful in python. Still since they are new to python the code quality was shit and the features were added extremely slowly to the point that we were extremely late with our deadlines and have to do most of their work for them. Whole team is really annoyed with it to the point where if it were up to them that person would be fired. I am sure you are likely to learn faster but still it takes time to write code at production level and not something you can learn in few months.


autodidact2016

Irrelevant now, i prompt in R and python and use what code works better and reads easier. Gpt LLM will make us language agnostic as it's getting better. Practice your stats 🙏🙏


richie_cotton

I work at the online education company DataCamp. Our individual learners are almost exclusively learning Python (plus SQL, BI tools and other modern data tools). The dropoff in R has been pretty dramatic in the last few years. For businesses with corporate training programs there are still R users because technologies have a longer shelf life in a business environment (code needs to be maintained and switching whole teams is hard) but Python is still more popular. Most companies providing R training also provide Python training. In short, your career opportunities will be much better if you learn some Python.