Страница телеграм канала Spark in me - Internet, data science, math, deep learning, philosophy

snakers4 logo

Spark in me - Internet, data science, math, deep learning, philosophy

1332 подписчиков

All this - lost like tears in rain. Internet, data science, math, deep learning, philosophy. No bs. Our website - http://spark-in.me Our chat - https://goo.gl/WRm93d DS courses review - http://goo.gl/5VGU5A - https://goo.gl/YzVUKf


Входит в категории: Технологии
Spark in me - Internet, data science, math, deep learning, philosophy
10.02.2020 16:02
Facebooks ML Panopticon Tin-foil hat mode on. From an engineering standpoint, I admire the balls on these people and their excellent libraries. ... But you know, where all of this will lead us, dont you? Text Panopticon Recently they posted CCMatrix, which is - Based on Common Crawl - Mined using translation + FAISS - 3.5 billions parallel sentences - 661 million are aligned with English - 17 language pairs have more then 30 million parallel sentences, 82 more then 10 million, and most more than one million To do this, you do not even need a lot of compute (8 GPUs?). But you DO need pre-trained NMT models for all of the languages. This is fine, this was on the news. Combo But also, check out how beautiful it is: - PyTorch (by Facebook) + - FAISS (by Facebook) + - NMT (FairSeq by FaceBook) + - LASER (by Facebook) + - CommonCrawl (NOT by Facebook) = Huge NMT corpus Speech Panopticon Now to less covered facts. Facebook owns wit.ai, which collects speech-to-text data. wit.ai STT is inferior to Google (now), but they collect presumably real business calls from their clients. Also Facebook collects datasets like this one (go to 4.1 Data). The training set used for our experiments consists of around 1 million utterances (13.7K hours) of in-house English videos publicly shared by users. The data is completely anonymized and doesn’t contain any personally-identifiable information (PII). We report results on two test sets - vid-clean and vid-noisy consisting of around 1.4K utterances each (20 hours). More information about the data set can be found in [23]. Also recently they started mining books massively. Yeah right, they publish massive unsupervised data, but do not tell me that they have chosen books NOT because book ARE annotation. (also wtf, they use flac which takes 10x more space than say opus) Internally we all know that they WILL turn them into annotation. And legally probably it will even be ok, but most likely they will not speak about it. So, their book dataset is just a form of giving back to the community. Unsurprisingly all of this coincides with their wav2letter++ releases and attempts at unsupervised STT (which does not work). So why do they do all of this? To train and improve STT models in 100 languages and harvest your organs, of course. Annotation in STT is very expensive. VERY expensive. But you can use knowledge distillation approach if move step by step. So what? Connect the obvious dots please. I hope that you / your family and friends are already not on Facebook. #deep_learning
Читать

Обращаем внимание, что мы не несем ответственности за содержимое(content) того или иного канала размещенный на нашем сайте так как не мы являемся авторами этой информации и на сайте она размещается в автоматическом режиме


Spark in me - Internet, data science, math, deep learning, philosophy
09.02.2020 13:02
Also use my Give $100, Get $25 link if you missed it!
Читать

Обращаем внимание, что мы не несем ответственности за содержимое(content) того или иного канала размещенный на нашем сайте так как не мы являемся авторами этой информации и на сайте она размещается в автоматическом режиме


Spark in me - Internet, data science, math, deep learning, philosophy
09.02.2020 13:02
Читать

Обращаем внимание, что мы не несем ответственности за содержимое(content) того или иного канала размещенный на нашем сайте так как не мы являемся авторами этой информации и на сайте она размещается в автоматическом режиме


Spark in me - Internet, data science, math, deep learning, philosophy
09.02.2020 13:02
Some Proxy Related Tips and Hacks ... Quick Ez and in Style =) DO is not cool anymore First of all - lets get the Elephant out of the room. Remember I recommended Digital Ocean? Looks like they behave like a f**g corporation now. They require you selfie with a passport now. F**k this. Even AWS does not need this. Why would you need proxies? Scraping mostly. Circumventing anal restrictions. Sometimes there are other legit use cases like proxying your tg api requests. Which framework to use? None. One of our team tried scrapy, but there is too much hassle (imho) w/o any benefits. (apart from their corporate platform crawlera, but I do not still understand why it exists, enterprise maybe) Just use aiohttp, asyncio, bs4, requests, threading and multiprocessing. And just write mostly good enough code. If you do not need to scrape 100m pages per day or use selenium to scrape JS, this is more than enough. Really. Do not buy-in into this cargo cult stuff. Video For video-content there are libraries: - youtube-dl - lots of features, horrible python API, nice CLI, it really works - pytube - was really cool and pythonic, but author abandoned it. Most likely he just wrote a ton of regexp that he decided not to support. Some methods still work though Also remember that many HTTP libraries have HTTP / SOCK5 support. If the libraries are old, this may be supported via env variables. Where to get proxies? The most interesting part. There are "dedicated" services (e.g. smartproxy / luminati.io / proxymesh / bestproxy.ru / mobile proxy services). And they probably are the only option when you scrape Amazon. But if you need high bandwidth and many requests - such proxies usually have garbage speed. (and morally - probably 50% of them are hacked routers) Ofc there is scrapoxy.io/ - but this is just too much! Enter Vultr They have always been a DO look-alike serice. I found a simple hacky way to get 10-20-40 proxies quickly. You can just use Vultr + Ubuntu 18.04 docker image + write a plain startup script. That is is. Literally. With Docker already installed your script may looks something like this: docker run -d - name socks5_1 -p 1080:1080 -e PROXY_USER=user -e PROXY_PASSWORD=password serjs/go-socks5-proxy && docker run -d - name socks5_2 -p 1081:1080 -e PROXY_USER=user -e PROXY_PASSWORD=password serjs/go-socks5-proxy There are cheaper hosting alternatives - Vultr is quite expensive. But his script feature + Docker images really save time. But now they have the nice features - backups / snapshots / startup scripts / ez scaling / etc / etc w/o the corporate bs. Also use my Give $100, Get $25 link! Beware - new accounts may be limited to 5 servers per account. You may want to create several accounts at first. #data_science #scraping
Читать

Обращаем внимание, что мы не несем ответственности за содержимое(content) того или иного канала размещенный на нашем сайте так как не мы являемся авторами этой информации и на сайте она размещается в автоматическом режиме


Spark in me - Internet, data science, math, deep learning, philosophy
07.02.2020 11:02
Setting up Wi-Fi on a Headless Server Yeah, thats a pain in the ass! Chicken and egg problem - you need to install packages, but you need to set-up Wi-Fi first. So first you need to install packages ... by copying them via USB stick. Remember CD-ROM sneakernet? =) Also making Wi-Fi robust to reboots is a pain. This guides worked for me on Ubuntu 18.04.3 server: - https://www.linuxbabe.com/command-line/ubuntu-server-16-04-wifi-wpa-supplicant - rc.local worked instead of systemd or crontab for me https://gist.github.com/mohamadaliakbari/1cb9400984094541581fff07143e1c9d - better use your router to nail down the static IP #linux
Читать

Обращаем внимание, что мы не несем ответственности за содержимое(content) того или иного канала размещенный на нашем сайте так как не мы являемся авторами этой информации и на сайте она размещается в автоматическом режиме


Spark in me - Internet, data science, math, deep learning, philosophy
04.02.2020 16:02
2020 DS / ML Digest 2 Highlights - New STT benchmarks from FAIR - Analysis of GPT-2 by thegradient - Google’s Meena, a 2.6 billion parameter end-to-end trained neural conversational model (not AGI ofc) - OpenAI now uses PyTorch - LaserTag - cool idea on how to handle simpler s2s tasks, i.e. error correction Please like / share / repost! https://spark-in.me/post/2020_ds_ml_digest_02 #digest
Читать

Обращаем внимание, что мы не несем ответственности за содержимое(content) того или иного канала размещенный на нашем сайте так как не мы являемся авторами этой информации и на сайте она размещается в автоматическом режиме


Spark in me - Internet, data science, math, deep learning, philosophy
04.02.2020 11:02
AGI is near, they said https://github.com/google-research/google-research/blob/master/meena/meena.txt#L53 https://github.com/google-research/google-research/blob/master/meena/meena.txt#L576
Читать

Обращаем внимание, что мы не несем ответственности за содержимое(content) того или иного канала размещенный на нашем сайте так как не мы являемся авторами этой информации и на сайте она размещается в автоматическом режиме


Spark in me - Internet, data science, math, deep learning, philosophy
03.02.2020 14:02
Using Viola With The Power of Vue.js Remember the last post about python dashboard / demo solutions? Since then we tried Viola and Streamlit. Streamlit is very cool, but you cannot build really custom things with it. You should have 100% support of widgets that you need, and there are always issues with caching. Also it is painful to change the default appearance. Remember that viola "In theory has customizable grid and CSS"? Of course someone took care of that! Enter ipyvuetify + voila-vuetify. TLDR - this allows you to have viola demos with Vue UI Library, all in 100% python. Problems: - No native table widget / method to show and pandas tables. There are solutions that load Vue UI tables, but no s out-of-the-box - All plotting libraries will work mostly fine - All jupiter widgets will work fine. But when you will need a custom widget - you will have to either code it, or find some hack with js-links or manual HTML manipulations - Takes some time to load, will NOT scale to hundreds / thousands of concurrent users - ipyvuetify is poorly documented, not very intuitive examples Links: - https://github.com/voila-dashboards/voila-vuetify - https://github.com/mariobuikhuizen/ipyvuetify #data_science
Читать

Обращаем внимание, что мы не несем ответственности за содержимое(content) того или иного канала размещенный на нашем сайте так как не мы являемся авторами этой информации и на сайте она размещается в автоматическом режиме


Spark in me - Internet, data science, math, deep learning, philosophy
29.01.2020 16:01
The State of Native Quantization in PyTorch Yeah, right. PyTorch 1.3 and / or 1.4 boasted native qint8 quantization support. So cool, right? They have 2 main tutorials: - Memory intensive networks with linear layers (BERT) => dynamic quantization - Convolutional networks => static quantization I have not tried vanilla BERT and / or vanilla MobileNet (please tell if you tried!), but looks like: - 1D convolutions are not supported yet - Native nn.transformers dynamic quantization ... does not work It is kind of meh, because I used their native layers (instead of hugging face for example) ... to avoid this exact kind of issue! =) Anyway, tell me if quantization worked for you in PyTorch, and meanwhile you can upvote these feature requests / bug reports if you feel like these features are useful! - Fix nn.transformer quantization - 1D conv support Ofc I also could spend a couple of months fiddling with quantization myself, but it looks like this year will be more production driven, so why do the same job as they are clearly now are focused on doing?) #deep_learning
Читать

Обращаем внимание, что мы не несем ответственности за содержимое(content) того или иного канала размещенный на нашем сайте так как не мы являемся авторами этой информации и на сайте она размещается в автоматическом режиме


Spark in me - Internet, data science, math, deep learning, philosophy
28.01.2020 13:01
Collapsible Headings now in JupyterLab - https://github.com/aquirdTurtle/Collapsible_Headings The only thing that kept me from switching! Please share your favourite plugins for JupyterLab! #data_science
Читать

Обращаем внимание, что мы не несем ответственности за содержимое(content) того или иного канала размещенный на нашем сайте так как не мы являемся авторами этой информации и на сайте она размещается в автоматическом режиме


Spark in me - Internet, data science, math, deep learning, philosophy
23.01.2020 19:01
https://umap-learn.readthedocs.io/en/0.4dev/plotting.html
Читать

Обращаем внимание, что мы не несем ответственности за содержимое(content) того или иного канала размещенный на нашем сайте так как не мы являемся авторами этой информации и на сайте она размещается в автоматическом режиме


Spark in me - Internet, data science, math, deep learning, philosophy
23.01.2020 19:01
Soo cool! Umap has a dedicated built-in plotting tool!)))
Читать

Обращаем внимание, что мы не несем ответственности за содержимое(content) того или иного канала размещенный на нашем сайте так как не мы являемся авторами этой информации и на сайте она размещается в автоматическом режиме


Spark in me - Internet, data science, math, deep learning, philosophy
18.01.2020 10:01
бинарей нет готовых, последняя серия карт Navi не поддерживается. кому такое нафиг надо
Читать

Обращаем внимание, что мы не несем ответственности за содержимое(content) того или иного канала размещенный на нашем сайте так как не мы являемся авторами этой информации и на сайте она размещается в автоматическом режиме


Spark in me - Internet, data science, math, deep learning, philosophy
18.01.2020 10:01
About AMD support ...
Читать

Обращаем внимание, что мы не несем ответственности за содержимое(content) того или иного канала размещенный на нашем сайте так как не мы являемся авторами этой информации и на сайте она размещается в автоматическом режиме


Spark in me - Internet, data science, math, deep learning, philosophy
17.01.2020 15:01
Decided to my OpenVPN installation, since I already gave my VPN to several people Tried pritunl - it really works out of the box - probably full installation would take 10-15 mins really Another valid alternative is a dockerized OpenVPN Some time ago I wrote a plain down-to-earth guide for windows users on how to rent a server, create a key, etc etc - if you would like the same for this VPN - ping me
Читать

Обращаем внимание, что мы не несем ответственности за содержимое(content) того или иного канала размещенный на нашем сайте так как не мы являемся авторами этой информации и на сайте она размещается в автоматическом режиме