D O K U M E N T U M A Z O N O S Í T Ó F á j l n é v : csabai_istvan_adatintenziv.jpg B é l y e g k é p : https://dka.oszk.hu/078300/078357/csabai_istvan_adatintenziv_kiskep.jpg F ő c í m : Adatintenzív megközelítés a tudományokban B e s o r o l á s i c í m : Adatintenzív megközelítés a tudományokban S z e r e p : létrehozó B e s o r o l á s i n é v : Csabai U t ó n é v : István I n v e r t á l a n d ó n é v : N E s e m é n y : felvéve I d ő p o n t : 2021-05-14 E s e m é n y : elérhető I d ő p o n t : 2021-04-07 D á t u m r a v o n a t k o z ó m e g j e g y z é s : Az előadás időpontja. A t í p u s n e v e : prezentáció A t í p u s n e v e : előadás M e g n e v e z é s : Prezentáció M e g n e v e z é s : Könyvtártudomány - prezentáció M e g n e v e z é s : Networkshop 2021 M e g n e v e z é s : Videotorium A j o g t u l a j d o n o s n e v e : Csabai István S z e r z ő i j o g i m e g j e g y z é s e k : Jogvédett T é m a k ö r : Számítástechnika, hálózatok A l t é m a k ö r : Internetes technológia T é m a k ö r : Számítástechnika, hálózatok A l t é m a k ö r : Internet használat T á r g y s z ó : tudomány M i n ő s í t ő : tárgyszó/kulcsszó T á r g y s z ó : adat M i n ő s í t ő : tárgyszó/kulcsszó T á r g y s z ó : adatfeldolgozás M i n ő s í t ő : tárgyszó/kulcsszó T á r g y s z ó : információáramlás M i n ő s í t ő : tárgyszó/kulcsszó T á r g y s z ó : adatbázis M i n ő s í t ő : tárgyszó/kulcsszó T á r g y s z ó : 2021 M i n ő s í t ő : időszak K é p a l á í r á s : Adatintenzív megközelítés a tudományokban N y e r s v a g y O C R - e s s z ö v e g : Data-intensive approach in sciences
ISTVAN CSABAI DEPARTMENT OF PHYSICS OF COMPLEX SYSTEMS ELTE EÖTVÖS LORÁND UNIVERSITY, BUDAPEST
Acknowledgement: Ministry of Innovation and Technology NRDI Office, MILAB Artificial Intelligence National Laboratory Program, FIEK_16-1-2016-0005, 2020-4.1.1.-TKP2020, NVKP_16-1-2016-0004, H2020 VEO No. 874735.
NETWORKSHOP 2021.04.07
History of (machine) intelligence / data science
World
Model
History of (machine) intelligence / data science
World
Model
History of (machine) intelligence / data science
Model
Instruments
World
Natural intelligence
Homo Sapiens: Technical Specifications
CPU 100 GN (giga-neurons)
7±2 bit
Pollack, I. The information of elementary auditory displays. J. Acoust. Soc. Amer., 1952, 24, 745-749.
Clock frequency 4-32 Hz
CPU cores 1 (male version), 2+ (female v.)
CPU speed 0.1 Flops (floating point op. / sec)
Memory (short term) 7 +/-2 bits
Storage 1TB-2.5PB
Power 20 W
Camera 576Mpix, 24Hz
Touch Yes
Display No
Speakers Mono
GPS No
WIFI No
Bluetooth No
2G/3G/4G/5G No/No/No/No
Latest version update
100 000 BC
Main Features :
• Find food
• Escape predators
• Kill enemies
• Find mate and reproduce
History of (machine) intelligence / data science
First "Data Science"
Tabulae Rudolphinae (1627), 23 years,
History of (machine) intelligence / data science
Science - technology - science - technology...
Prototype of modern "data science"
SLOAN DIGITAL SKY SURVEY:
2.5 terapixel image - 300 million 640 fibers - galaxies - 5 optical bands 1 million spectra
2.5 terapixel image - 300 million 640 fibers-
2.5m 120Mp –> 2.5Tp 5 years:10TB
New issue: BIG DATA !!!
CfA 1989: 1100 galaxies
Huge data tables
Scientific goals and researcher’s perspective
Queries in data space: e.g. separate stars and galaxies
petroMag_i > 17.5 and (petroMag_r > 15.5 or petroR50_r > 2) and(petroMag_r > 0 andg > 0 and r > 0 and i> 0) and ( (petroMag_r extinction_r) < 19.2 and (petroMag_r extinction_r < (13.1 + (7/3) * (dered_g dered_r) + 4 * (dered_r
dered_i) 4 * 0.18) ) and ( (dered_r dered_i (dered_g dered_r)/4 0.18) < 0.2) and ( (dered_r dered_i (dered_g dered_r)/4 0.18) > 0.2) and ( (petroMag_r extinction_r + 2.5
* LOG10(2 * 3.1415 * petroR50_r * petroR50_r)) < 24.2) ) or ( (petroMag_r extinction_r < 19.5) and ( (dered_r dered_i (dered_g dered_r)/4 0.18) > (0.45 4 * (dered_g dered_r)) ) and ( (dered_g dered_r) > (1.35 +
0.25 * (dered_r dered_i)) ) ) and ( (petroMag_r extinction_r +
2.5 * LOG10(2 * 3.1415 * petroR50_r * petroR50_r) ) < 23.3 ) )
New skills: Indexing, databases
• SDSS data "read through"~1 day
• Astronomers should learn: Database programming, computer geometry, search trees,...
• Multidimensional-and spherical indexing
Modern data science: same trends in biology, environmental sciences, social sciences, ...
Not only astronomy: genomics
Sanger-sequencing First virus sequence 1977: .X174, 5386nt
Nyitray Lászl, Pál Gábor: A biokémia és molekuláris biolgia alapjai (2013)
30 years later: NGS, nanopore
Moore's law in genomics
Sequencing is getting cheaper. More (public) data available.
(HGP) 1990–2003 2020 2030? 13 years / 2,7 billion USD Few days / <500 USD
Biology in the 20th 21st century
2020.01.01-2025.12.31
• Infectious diseases are results of complex interactions of several domains
• Without global monitoring of the drivers we cannot handle or prevent outbreaks
• Need: collection, integration, organization, sharing and analyzing complex large data sets
• Barriers: practical + legal and ethical issues
• +P, COVID19
Task 2.1 Platform development: KOOPLEX: collaborative data
Key challenges: amount of data and complexity of models
word!)
pV = NkT
6·1023›5
Complex systems - complex models
Complex function regression: machine learning!
AI: paradigm shift
Example: Image recognition Method: hand crafted features
f(
= "apple" f( = "tomato" f( = "cow"
IF color=red AND profile=smooth THEN type:=tomato IF color=red AND HAS(horns) THEN type:=cow
Data
Model
Prediction
Learning -> loss function optimization
images -> points in N dim space
Loss = number of wrong categorizations (error)
Complex systems – complex models
To understand complex systems we need complex models
Complex models, 2M+ parameters!
We need
• Huge amount of data to set up, constrain, parametrize the models
• Powerful computers and clever algorithms
Complex function regression: machine learning!
AI Research, Education and Applications @ Ev University
Dept. of Physics of Complex Systems
• Genetics -> antibiotics resistance
Matamoros et al., Pataki et al. 2020.
• Mobile sensors -> Parkinson
Pataki @DREAM, Laki et al. 2016
• Mosquito images -> vector borne diseases
Pataki et al. Sci.Rep. 2021
• Medical imaging -> breast cancer
Ribli et al. @DREAM, Sci. Rep. 2018
• Weak lensing map -> cosmology parameters
Ribli et al. Nature Astro. 2018, MNRAS 2019
• Explainable AI
Ribli et al. in prep.
• Control of aging related methylation networks
Palla et al. subm.
• Pathology images
SOTE TKP collab.
• Quantum ML
• MSc, PhD courses
Vector borne diseases: MosquitoAlert image deep learning
"Zika, dengue, chikungunya, and yellow fever are all transmitted to humans by Ae. aegypti and Ae. Albopictus."
F. Bartumeus et al. http://www.mosquitoalert.com/
False(?) negatives:
False(?) positives:
Pataki et al. Sci. Rep. 2021.
Space weather : whistler detection Language of the genome
Pollen monitoring
Animal health
Deep learning for colorectal cancer pathology
Mammography with deep learning (Faster R-CNN )
• Digital Mammography DREAM challenge • 1200 participants
• Dezső Ribli, best final result
• the only solution with localization
• AUC = 0.95
• Publication: Nature Scientific Reports (2018)
• 30-th most popular from 17000 articles
• New collaborations with hospitals, clinics
• more training data • open source plugin
• steps towards licensing
D. Ribli, A. Horváth, Z. Unger, P. Pollner, and I. Csabai. "Detecting and classifying lesions in mammograms with deep learning." Scientific reports (2018)
Explainable AI: automatic classification enhancement
Any sufficiently advanced technology is indistinguishable from magic. /Arthur C. Clarke/
Indeed, understanding the laws of mechanics made us able to build pyramids and cathedrals, based on the laws of thermodynamics the invention of the steam engine empowered us to cross oceans and continents and today we all have "seven-league boots" in our garages. Understanding electrodynamics and quantum mechanics brought us the transistor that is at the heart of the Internet and the modern "magic mirrors", the mobile phones. With the advancements of high throughput techniques we may be ready to tackle another frontier: life and intelligence at last, because it is the most sophisticated and complex. End of diseases, much longer healthy life,...?
What miracles will the advancements of machine learning bring? And what kind of challenges?
NEW PARADIGMS NEED NEW RESEARCHERS
EDUCATION: We need new scientist who have professional skills both in their
István Csabai
ELTE Dept. of Physics of Complex Systems csabai@elte.hu http://complex.elte.hu/~csabai/ D o k u m e n t u m n y e l v e : angol K a p c s o l ó d ó d o k u m e n t u m n e v e : Dengel Eszter: Az interfész és az információ tárolása A f o r m á t u m n e v e : PowerPoint prezentáció O l d a l a k s z á m a : 38 T e c h n i k a i m e g j e g y z é s : Microsoft Office PowerPoint 2016 M e t a a d a t a d o k u m e n t u m b a n : N A f o r m á t u m n e v e : PDF dokumentum O l d a l a k s z á m a : 38 M e t a a d a t a d o k u m e n t u m b a n : N A f o r m á t u m n e v e : HTML dokumentum T e c h n i k a i m e g j e g y z é s : HTML 5 verzió M e t a a d a t a d o k u m e n t u m b a n : N L e g j o b b f o r m á t u m : JPEG képállomány L e g n a g y o b b k é p m é r e t : 770x433 pixel L e g j o b b f e l b o n t á s : 72 DPI S z í n : színes T ö m ö r í t é s m i n ő s é g e : közepesen tömörített Á l t a l á n o s m e g j e g y z é s : Networkshop konferencia 2021 A z a d a t r e k o r d s t á t u s z a : KÉSZ A d o k u m e n t u m s t á t u s z a : INSIDE S z e r e p / m i n ő s é g : katalogizálás A f e l d o l g o z ó n e v e : Nagy Zsuzsanna |