Browsed by
Archives: Recommendations

Recommendation information and details

06/10/2023 – Preporuka za danas

06/10/2023 – Preporuka za danas

ChatGPT Citations | Formats & Examples

 

How to cite ChatGPT, Bard and Bing in scientific and professional works? Today’s recommendations provide links to some of the options available to you.

Tacoma Community College paper talks about:

  • Citing AI outputs: It provides information on how to cite the outputs of generative AI tools, such as ChatGPT, Bard, and Copilot, in academic or professional work.
  • AI policies and ethics: It advises the readers to check their instructors’ policies on the use of AI in their course work and to follow the TCC Academic Integrity Policy. It also warns that AI outputs are not reliable sources and should be used with caution.
  • AI citation examples: It gives some examples of how to cite AI outputs in different formats, based on the suggestions from the AI tools themselves. It also recommends using a consistent citation style within an assignment.

 

Pročitajte više

Presented Guidelines are still evolving, and talks about:

  • ChatGPT citations: It provides formats and examples for citing the outputs of ChatGPT
  • AI policies and ethics: It advises the readers to check their instructors’ policies on the use of AI in their course work and to follow the TCC Academic Integrity Policy. It also warns that AI outputs are not reliable sources and should be used with caution.
  • AI citation examples: It gives some examples of how to cite ChatGPT outputs in different formats, such as APA, MLA, and Chicago, based on the suggestions from the AI tools themselves. It also recommends using a consistent citation style within an assignment.

Pročitajte više

This APA Style document talks about:

  • Citing ChatGPT outputs: It provides formats and examples for citing the outputs of ChatGPT
  • AI policies and ethics: It advises the readers to check their instructors’ policies on the use of AI in their course work and to follow the TCC Academic Integrity Policy. It also warns that AI outputs are not reliable sources and should be used with caution.
  • AI citation examples: It gives some examples of how to cite ChatGPT outputs in different formats, such as APA, MLA, and Chicago, based on the suggestions from the AI tools themselves. It also recommends using a consistent citation style within an assignment.

Pročitajte više

Also, APA Style document describes how to cite a work with a nonrecoverable source. It explains:

  • What are nonrecoverable sources and why they should not be included in the reference list. It also provides the format for citing them as personal communications in the text.
  • AI outputs: It gives information on how to cite the outputs of generative AI tools, such as ChatGPT, Bard, and Copilot, in academic or professional work. It advises the readers to check their instructors’ policies on the use of AI in their course work and to follow the TCC Academic Integrity Policy.
  • Broken URLs and DOIs: It instructs the readers to check their reference list entries for accuracy and update them as necessary. It also suggests using Internet Archive to find archived versions of the pages or deleting the reference list entry if no archived version is available

Pročitajte više
22/02/2021 – Preporuka za danas

22/02/2021 – Preporuka za danas

Top 15 Websites for Data Scientists to Follow in 2021

Sites and blogs that inspire learning.

Destin Gong Jan 18 · 6 min read. Toward Data Science

 

AI & Machine Learning

  1. Towards Data Science
  2. Analytics Vidhya
  3. KDnuggets
  4. Springboard

Data Engineering

  1. Uber Engineering Blog
  2. Netflix Tech Blog
  3. Airbnb Engineering & Data Science

Data Visualization

  1. Storytelling with Data
  2. Tableau Viz of the Day
  3. Information is Beautiful
  4. Nightingale

Business Acumen

  1. Entrepreneur
  2. Forbes
  3. Business Insider
  4. Hubspot

 

Pročitajte više
19/02/2021 – Preporuka za danas

19/02/2021 – Preporuka za danas

 

Top 15 Websites for Data Scientists to Follow in 2021

 

Sites and blogs that inspire learning.

Destin Gong Jan 18 · 6 min read. Toward Data Science

 

AI & Machine Learning

  1. Towards Data Science
  2. Analytics Vidhya
  3. KDnuggets
  4. Springboard

Data Engineering

  1. Uber Engineering Blog
  2. Netflix Tech Blog
  3. Airbnb Engineering & Data Science

Data Visualization

  1. Storytelling with Data
  2. Tableau Viz of the Day
  3. Information is Beautiful
  4. Nightingale

Business Acumen

  1. Entrepreneur
  2. Forbes
  3. Business Insider
  4. Hubspot

Pročitajte više
17/02/2021 – Preporuka za danas

17/02/2021 – Preporuka za danas

Do you need a graduate degree for data science?  Maybe so. Maybe not.

Jeremie Harris Dec 18, 2018·8 min read. Toward data science

“I’ll level with you: I’m a PhD dropout.

 

I’ve gotten a lot of mileage out of that title, by the way: it hints that I’ve done a lot of grad school, but still maintains the aura of badassery that only the word “dropout” can provide. In some ways, it’s the ultimate humble brag. Graduate with a PhD, and you’re one nerd among ten thousand. But drop out 2.5 years into it, and you’re an edgy nerd. People will wonder what other edgy shit you might to do next. “Elon Musk dropped out of grad school,” they’ll say. “This guy could be just like Elon!”

The advice I’ve given here is unconventional in many ways. But in a rapidly developing field like data science, convention can often lag considerably behind what’s optimal. As a society, our perception of the value of graduate education is one of the aspects of conventional wisdom that’s most badly in need of catching up to reality.

So take it from a former academic-turned-startup-founder: not all degrees are for everyone. Here’s why.

 

  1. The PhD

If your goal is to become a data scientist or machine learning engineer/researcher, a PhD *might* be a good move. But there are also two big reasons why it might not be:

  1. It takes a REALLY long time to get a PhD.
  2. You’re unlikely to learn anything of value unless you get the “right” PhD from the “right” supervisor.

To the first point: in the U.S. or Canada, a PhD takes anywhere from 4 years to 7 or 8 years to complete. The median time to completion is usually 5 or 6 years, depending on the institution. Now let’s put that into perspective.

You know what wasn’t a thing in data science 5 years ago? Spark, XGBoost, jupyter notebooks, GloVe, spaCy, TensorFlow, Keras, Pytorch, InceptionNet, ResNet, reinforcement learning (like, basically at all), etc, etc.

To the second point: take a moment to think about who would be supervising you, and why they aren’t already working at Google or Facebook.

 

  1. The M.Sc.

Do you need a Master’s to do data science?

 

  1. The undergrad?

Pročitajte više
15/02/2021 – Preporuka za danas

15/02/2021 – Preporuka za danas

The Complete Guide to Decision Trees

Marco Peixeiro. Toward Data Science, 2019

 

A complete introduction to decision trees, how to use them for regression and classification, and how to implement the algorithm in a project setting

„A single decision tree is often not as performant as linear regressionlogistic regressionLDA, etc. However, by introducing bagging, random forests, and boosting, it can result in dramatic improvements in prediction accuracy at the expense of some loss in interpretation.

In this post, we introduce everything you need to know about decision trees, bagging, random forests, and boosting. It will be a long read, but it will be worth it!“

For hands-on video tutorials on machine learning, deep learning, and artificial intelligence, checkout authors YouTube channel.

Pročitajte više
09/02/2021 – Preporuka za danas

09/02/2021 – Preporuka za danas

The Truth About Data Science — Not As Rosy as You Might Think.

Jason Dsouza. Medium.com

„When Harvard Business Review came out with its article labelling Data Science the sexiest job of the 21st century, it grabbed a lot of eyeballs. Safe to say that that ~3000-word article initiates a mad frenzy as people scrambled to learn Data Science with the eventual hope of becoming a Data Scientist.

About 10 years later, that hype doesn’t seem to be dying anytime soon (if any, it has exploded exponentially).

But before you think about transitioning into this — admittedly — very lucrative field, there are a few truths that no one seems to be talking about, and which you absolutely must know before the eventual career shift.

  • Supply far exceeds Demand
  • Recruiters have no idea of what a “Real Data Scientist” looks like
  • Data Science involves a lot of Math

 

My advice: Do your research!

Watch videos and read articles on the day-to-day life of a Data Scientist and the Reality vs Expectation. A little research in the beginning will go a long way in helping you make the right choice”.

Pročitajte više
21/01/2021 – Preporuka za danas

21/01/2021 – Preporuka za danas

Primena tehnika vizualizacije u bazičnoj statistici.

Dejan Pajić. Univerzitet u Novom Sadu, Filozofski fakultet. 2020, – 283 str. ISBN 978-86-6065-582-2

Besplatna, PDF verzija

i interaktivna verzija knjige.

Ovaj interaktivni udžbenik iz oblasti bazične statistike (kao i pdf verzija), po rečima autora „napisan je za potrebe kursa Uvod u statistiku na osnovnim studijama psihologije na Filozofskom fakultetu u Novom Sadu, ali je namenjen svima koji žele da se upoznaju sa ključnim principima statističkog rasuđivanja i osnovnim postupcima statističke obrade podataka. Većina odeljaka u udžbeniku propraćena je odgovarajućom interaktivnom vežbom koja podrazumeva da čitalac bude aktivno angažovan u kreiranju podataka i njihovoj interpetaciji uz pomoć različitih tehnika grafičkog predstavljanja. Udžbenik je dostupan i u PDF verziji, ali u njoj nije moguće koristiti napredne opcije koje su podržane u veb verziji.

Odeljcima udžbenika pristupa se sa stranice  Sadržaj. Preporučujemo da se pre prvog čitanja udžbenika upoznate sa njegovom strukturom, a potom krenete od poglavlja Uvod. U gornjem desnom uglu svake stranice nalaze se linkovi za navigaciju pomoću kojih možete da otvorite različite stranice, pređete na naredni odeljak udžbenika ili da se vratite na prethodni. Kada postavite pokazivač miša preko ikonice  prikazuje se umanjeni sadržaj udžbenika pomoću koga lako možete da utvrdite na kojoj stranici se trenutno nalazite i da pređete na neki drugi odeljak. Kliknite istu ikonicu da biste videli sadržaj u punoj formi.

Pored interaktivnih vežbi, u nekim odeljcima postoje i pitanja na osnovu kojih možete da procenite u kojoj meri ste razumeli i savladali upravo pročitani tekst. Kliknite na pitanje da biste prikazali okvir sa odgovorom.

Otvaranje interaktivnih okvira

Većina odeljaka u udžbeniku sadrži interaktivnu vežbu koja se prikazuje u zasebnom okviru na vrhu ekrana. Okviri se otvaraju na dva načina. Prvi je klikom na odgovarajuće linkove u tekstu koji su vizuelno drugačiji od ostalih linkova, a drugi je klikom na link (strelicu) u vrhu ekrana. Strelicu u vrhu ekrana možete da upotrebite za zatvaranje okvira i za njegovo ponovno otvaranje. Klikom na strelicu otvara se okvir predviđen za poglavlje, odnosno deo teksta koji trenutno čitate, što znači da se na različitim delovima iste stranice mogu otvoriti različiti okviri.

Pretraga udžbenika

Na stranici  Pretraga nalazi se okvir za pretraživanje udžbenika. Kada u okvir unesete jednu ili više reči i kliknete taster , biće prikazana lista delova teksta u kojima se pojavljuje traženi izraz. Prikazani delovi teksta su ujedno i linkovi ka odgovarajućim stranicama udžbenika. Prilikom unosa termina za pretragu možete da koristite simbol * (zvezdica) kao džoker kojim se zamenjuje bilo koja niska slova. Na primer, izraz stat* test* možete da upotrebite da biste pronašli delove teksta u kojima se javljaju fraze statističko testiranjestatistički testovistatističkih testova i sl.

Registracija korisnika

Udžbenik je dostupan u režimu otvorenog pristupa i distribuira se pod licencom Creative Commons license Attribution-NonCommercial-ShareAlike – CC BY-NC-SA. Detaljno objašnjenje značenja CC licenci možete pronaći na Nacionalnom portalu otvorene nauke. Registracija i prijavljivanje nisu obavezni, ali su neophodni da bi se koristile napredne opcije. Napredne opcije podrazumevaju podvlačenje, postavljanje obeleživača i beleženje. Podaci koje ostavite prilikom registracije neće biti deljeni trećim licima, niti ćete na navedenu imejl adresu dobijati poruke, osim u postupku registracije i verifikacije. Da biste se registrovali, popunite kratak formular na stranici  Prijava. Nakon prijavljivanja, na traci za navigaciju sa desne strane okvira sa tekstom pojavljuju se dodatni linkovi za podvlačenje, obeležavanje i dodavanje beležaka.

Podvlačenje

Delove teksta možete da podvučete tako što kliknete taster  i prevučete pokazivač miša preko teksta. Nakon toga, podvučeni delovi biće izlistani na stranici  Moj udžbenik koja postaje dostupna nakon prijave. Kada je neko poglavlje otvoreno, možete da se pozicionirate na podvučene delove teksta pritiskanjem tastera W (naredni) ili Q (prethodni). Da biste uklonili podvlačenje, kliknite taster  i potom podvučeni deo teksta. Sve dok je taster pritisnut, možete da podvlačite različite delove teksta. Da biste izašli iz režima podvlačenja, kliknite ponovo taster  na traci za navigaciju ili taster Esc na tastaturi.

Obeležavanje

Dodavanje obeleživača je slično podvlačenju, ali služi za jednokratno označavanje dela udžbenika na koji želite da se vratite u narednom pristupu. Deo teksta možete da obeležite tako što kliknete taster  i prevučete pokazivač miša preko teksta. Nakon toga, odabrani deo teksta će se videti na stranici  Moj udžbenik i služiće kao link do mesta na kojem je postavljen obeleživač. Da biste uklonili obeleživač, kliknite taster  i potom obeleženi deo teksta. Kada obeležite neki deo teksta, prethodni obeleživač se automatski briše.

Beleženje

Beleške možete da dodate tako što kliknete taster  i potom željeni deo teksta. U otvoreno polje unesite belešku i kliknite taster . Sve unete beleške izlistavaju se na stranici  Moj udžbenik. Kada je neko poglavlje otvoreno, možete da se pozicionirate na beleške pritiskanjem tastera S (naredna) ili A (prethodna). Da biste prikazali belešku, postavite pokazivač miša na ikonicu . Da biste uklonili belešku, kliknite ikonicu  i potom taster  u gornjem desnom uglu okvira za unos beleške.

Tehničke karakteristike

Za izradu elektronskog udžbenika korišćeni su programski jezici HTMLJavaskript i CSS koje podržavaju svi veb-pregledači. Međutim, preporučuje se upotreba pregledača ChromeFirefoxEdge ili Safari. Takođe se preporučuje ažuriranje odabranog pregledača, odnosno instaliranje njegove najnovije verzije. Tekst na stranicama prilagođen je za prikazivanje na mobilnim uređajima, ali zbog specifičnosti interaktivnih delova udžbenika, poželjno je da širina ekrana bude najmanje 1.000 piksela (px). Na užim ekranima moraćete da pomerate (skrolujete) interaktivne delove po vodoravnoj liniji kako bi svi elementi bili vidljivi.

Sve vizualizacije u udžbeniku izrađene su uz pomoć Javaskript biblioteke D3 – Data-Driven Documents koja se distribuira u skladu sa 3-Clause BSD licencom. Autor biblioteke je američki programer i stručnjak za vizuelizaciju podataka Majkl Bostok.

Većina primenjenih algoritama za statističku obradu delo su autora udžbenika. Složeniji postupci obrade podataka obavljaju se uz pomoć besplatne Javascript biblioteke jStat. Za ispisivanje formula upotrebljena je besplatna Javascript biblioteka MathJax.“

 

… za svaku preporuku.

Pročitajte više
15/01/2021 – Preporuka za danas

15/01/2021 – Preporuka za danas

GitHub Primer for Dummies.

A simple guide to using GitHub to host your complex code. Sam Liebman. Toward Data Science, 2018.

GitHub is an essential tool for programmers around the globe, allowing users to host and share code, manage projects, and build software alongside a growing base of almost 30 million developers. GitHub makes collaborating on code much easier by tracking revisions and modifications, allowing for anyone to contribute to a repository. As someone who only recently started programming, there have been countless times where GitHub has been a literal lifesaver, helping me learn new skills, techniques, and libraries. Yet, sometimes a simple task on GitHub such as creating a new repository or pushing new changes is more daunting than training a multi-layer neural network.

 

  1. Creating a Repository

A GitHub repository (“repo”) is a virtual location on GitHub where a user can store code, datasets, and related files for a project. How to create your repo:

Clicking on the new repository button on the homepage will bring you to a page where you can create a repo and add a name and brief description of the project. There is an option to make your repository public or private, but the private feature is only available to paying users/companies.

You can also initialize the repository with a README, which provides an overview and description of the project. Adding a README to your repository is highly recommended.

 

  1. Initialize your Git

The next step involves using your terminal to initialize your Git and push your first commit. Git is not the same thing as GitHub, although they are related. Git is a revision control system that helps manage source code history and edits, while GitHub is a website that hosts Git repositories. In layman’s terms, Git takes a picture of your project at the time of each commit and stores a reference to that exact state.

 

  1. Adding Files to Repository

The process for adding changes to your GitHub repo is similar to the initialization process. You can choose to add all the files in your project directory in one fell swoop, or add each file individually as edits are made. For a multitude of reasons, discovered through trial and error, I highly recommend pushing each file individually.

Pročitajte više
13/01/2021 – Preporuka za danas

13/01/2021 – Preporuka za danas

Quora.com Question (2019):

Machine learning seems to have settled down into ~ 1000 algorithms. Can’t we simply automate the job of a data scientist by just trying them all on any particular case and retaining the best performing one?”

Some of the interesting answers:

“I am afraid you do not understand what the title “Data scientist” entails. There are 2 main branches in data science/machine learning at the moment: software development (which is also called data science since you are using machine learning frameworks/scalable solutions for these algorithms), and actual data science. …”

 

“Where did this 1000 number come from? There are an almost infinite arrangements of valid networks for every problem, and only a few dozen prominent types of techniques to apply. Neither is near 1000 nor are they things you can cycle through and just try. Let’s try something simpler, say the preparation of data for processing. You need to normalize it. This can be rearranging columns or processing values into a different form among many tasks. Just those two things are simple but we don’t have a way to just try all the options and see what works. If I need to compute the speed between two measurements do I just let the computer randomly choose between operations and inputs until it hits upon the right one? No, of course not…”

 

“A data scientist doesn’t (or definitely shouldn’t) just throw all the existing algorithms at a problem to see which one sticks. A data scientist’s job is to create understanding out of raw data… A lot of that cannot be automated quite so easily. Throwing a classification algorithm at a regression problem is not going to work. If you have structured data (for example, I’m currently dealing with data coming from different sessions of buses; each point is a location and time stamp, together with some more information; just throwing a random machine learning algorithm at that will not understand that it needs to look at individual sessions separately and treat them as sequential data)”

Yes and No: It seems many problems arising from different fields of study like speech, image, text, music, control etc., Can be solved using any of the “standard” algorithms. These standard algorithms include random forests, gradient boosting, Monte Carlo…and the list goes on.

  • When I say “Yes”, even thought the internal workings of these algorithms are different, they still have a common objective. So, if you have well defined objective for your problem, you can iterate over all possible ML models and obtain more accurate and precise predictions. This works fine if you are only concerned about prediction accuracy. If you would like to infer something about the variables involved or model itself. This bruteforce approach is not going to help.
  • Now I come to “No” part of my answer. As I mentioned above, accuracy is not the always the king. The ML have been very successful in providing better predictions. The ML is still in it’s infancy, when compared to more matured subjects like mathematical statistics or physics. No general framework has been identified so far, which explains why some of these algorithms are exceptionally good at solving one particular class of problems while others can’t.”

 

“This is called ‘autoML’ and already exists. It’s also being developed by every major cloud provider. If this was a data scientists job, then they’d be very scared. However, it’s not.

Trying out different models is fun and fairly trivial in difficulty once the data is ready to go. The hard part of the job is everything around finding the best model.

Finding data, pipelining it, cleaning it, validating, wrangling it to the best functional form, mapping to reduce dimensionality, choosing a model that minimises the bias variance tradeoff while still running in an acceptable amount of time, understanding your outputs, translating them to a business solution, putting the power of the tool in the right persons hands, etc. Etc. Etc. There are so many other considerations that form a data scientists job.

Not to mention the fact that autoML could only work for supervised learning techniques and would not help with unsupervised or reinforcement techniques.

AutoML is super cool but it’s only replacing a very small part of any given data science solution.

(Also, lots of data solutions don’t even involve modelling, for instance visualisation or dashboarding projects.)”

Pročitajte više
08/01/2021 – Preporuka za danas

08/01/2021 – Preporuka za danas

Data Science

Statističko zaključivanje: testiranje hipoteza.

Da se podsetimo. 64 slajda:

  1. Testiranje hipoteza / osnovni pojmovi i procedure
  2. Koraci
  3. Hipoteze
  4. Izbor nivoa značajnosti
  5. Izbor Test statistike
  6. Izračunavanje statistike testa
  7. i 10. Statističko zaključivanje
  8. Kritična vrednost
  9. i 13. Greška I tipa (α) i Greška II tipa (β)
  10. Parametarski statistički testovi
  11. Provera normalnosti raspodele
  12. Učestalost – frekvencije
  13. Asimetričan oblik raspodele
  14. Zašiljenost / zaravnjenost raspodele
  15. Testiranje hipoteza o populacionim prosečnim vrednostima i proporcijama

20, 21 i 22.  Z-test

23 – 32. t-test

  1. Neparametarski statistički testovi

34 – 53. Hi kvadrat test

54 – 56. McNemarov test

57 – 61. Testiranje hipoteza o rangovima

62 – 63. Wilcoxonov test ekvivalentnih parova

  1. Izbor statističkog testa

 

 

Pročitajte više