- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Mar 2 2021
Remove a useless parameter of the load_and_check_config function
Update according review's feedbacks
What does that mean? Can an alias reference multiple indexes? How does that work in terms of ids for example?
yes, an alias can reference multi indexes. If same ids are present in several indexes, the risk is to have duplicate result if the documents are matching the search.
To be sure the disk is ok as it seems there is a high count of Raw_Read_Error_Rate, a complete read/write test was launched. It seems it will take some times to complete:
root@storage1:~# badblocks -v -w -B -s -b 4096 /dev/sda Checking for bad blocks in read-write mode From block 0 to 1465130645 Testing with pattern 0xaa: 0.74% done, 3:16 elapsed. (0/0/0 errors)
The disk was put back in place on the server.
Will we used different indexes for T2073 ?
Even with several indexes, It's not clear (for me at least) if using a unique read alias with several underlying indexes could be more advantageous. It will probably depend of how the search will be used from the api perspective.
Perhaps it should be more prudent to keep this diff as simple as possible and implement the eventual improvements in T2073.
WDYT?
Update commit message
Mar 1 2021
the backfill is done, the search on metadata seems to work correctly.
The backfill / reindexation looks aggressive for the cluster and the search. There is a lot of timeouts on the webapp's search
File "/usr/lib/python3/dist-packages/elasticsearch/connection/http_urllib3.py", line 249, in perform_request raise ConnectionTimeout("TIMEOUT", str(e), e) elasticsearch.exceptions.ConnectionTimeout: ConnectionTimeout caused by - ReadTimeoutError(HTTPConnectionPool(host='search-esnode3.internal.softwareheritage.org', port=9200): Read timed out. (read timeout=10))
Feb 19 2021
it seems the filtering is a good culprit as from a production worker, directly plugged on the public swh vlan, the inria's ntp server can't be reach either :
vsellier@worker01 ~ % ip route default via 128.93.166.62 dev ens18 onlink 128.93.166.0/26 dev ens18 proto kernel scope link src 128.93.166.16 192.168.100.0/24 dev ens19 proto kernel scope link src 192.168.100.21 192.168.101.0/24 via 192.168.100.1 dev ens19 192.168.200.0/21 via 192.168.100.1 dev ens19 vsellier@worker01 ~ % sudo systemctl stop ntp vsellier@worker01 ~ % sudo ntpdate sesi-ntp1.inria.fr 19 Feb 17:30:54 ntpdate[1868740]: no server suitable for synchronization found vsellier@worker01 ~ % sudo ntpdate europe.pool.ntp.org 19 Feb 17:31:42 ntpdate[1868761]: step time server 185.125.206.73 offset -0.555238 sec vsellier@worker01 ~ % sudo systemctl start ntp
There is still no changes on the ticket status page the 2021-02-19:
- journal-client and swh-search service stopped
- package upgraded
root@search1:/etc/systemd/system# apt list --upgradable Listing... Done python3-swh.search/unknown 0.6.1-1~swh1~bpo10+1 all [upgradable from: 0.5.0-1~swh1~bpo10+1] python3-swh.storage/unknown 0.23.2-1~swh1~bpo10+1 all [upgradable from: 0.23.1-1~swh1~bpo10+1] root@search1:/etc/systemd/system# apt dist-upgrade
- new mapping applyed and checked :
- before
% curl -s http://${ES_SERVER}/origin/_mapping\?pretty | jq '.origin.mappings' > mapping-v0.5.0.json
- upgrade
swhstorage@search1:~$ /usr/bin/swh search --config-file /etc/softwareheritage/search/server.yml initialize INFO:elasticsearch:HEAD http://search-esnode1.internal.softwareheritage.org:9200/origin [status:200 request:0.036s] INFO:elasticsearch:PUT http://search-esnode2.internal.softwareheritage.org:9200/origin/_mapping [status:200 request:0.196s] Done.
- after
% curl -s http://${ES_SERVER}/origin/_mapping\?pretty | jq '.origin.mappings' > mapping-v0.6.1.json
- check
% diff -U3 mapping-v0.5.0.json mapping-v0.6.1.json --- mapping-v0.5.0.json 2021-02-19 15:10:23.336628008 +0000 +++ mapping-v0.6.1.json 2021-02-19 15:12:50.660635267 +0000 @@ -1,4 +1,5 @@ { + "date_detection": false, "properties": { "has_visits": { "type": "boolean" @@ -25,6 +26,9 @@ } }, "analyzer": "simple" + }, + "visit_types": { + "type": "keyword" } } }
- reset the offsets
% /opt/kafka/bin/kafka-consumer-groups.sh --bootstrap-server $SERVER --reset-offsets --topic swh.journal.objects.origin_visit --to-earliest --group swh.search.journal_client --execute
- A reindex of the origin index to a backup is in progress to evaluate the possible duration of such operation with production volume
- For this migration, we are lucky as the changes are only new fields declarations. The metadata are not yet ingested in production so the documents don't have to be converted
- stop the journal client
root@search0:~# systemctl stop swh-search-journal-client@objects.service root@search0:~# puppet agent --disable "stop search journal client to reset offsets"
- reset the offset for the swh.journal.objects.origin_visit topic:
vsellier@journal0 ~ % /opt/kafka/bin/kafka-consumer-groups.sh --bootstrap-server $SERVER --reset-offsets --topic swh.journal.objects.origin_visit --to-earliest --group swh.search.journal_client --execute
Regarding the missing visit_type, one of the topic with the visit_type needs to be visited again to populate the fields for all the origins.
As the index was restored from the backup, the fields was only set for the visits done since the last 15days.
The offset will be reset for the origin_visit to limit the work.
Regarding the index size, it seems it's due to a huge number of deleted documents (probably due to the backlog and an update of the documents at each change)
% curl -s http://${ES_SERVER}/_cat/indices\?v health status index uuid pri rep docs.count docs.deleted store.size pri.store.size green open origin HthJj42xT5uO7w3Aoxzppw 80 0 868634 8577610 10.5gb 10.5gb green close origin-backup-20210209-1736 P1CKjXW0QiWM5zlzX46-fg 80 0 green open origin-v0.5.0 SGplSaqPR_O9cPYU4ZsmdQ 80 0 868121 0 987.7mb 987.7mb green open origin-toremove PL7WEs3FTJSQy4dgGIwpeQ 80 0 868610 0 987.5mb 987.5mb <-- A clean copy of the origin index has almose the same size as yesterday
Forcing a merge seems restore a decent size :
% curl -XPOST -H "Content-Type: application/json" http://${ES_SERVER}/origin/_forcemerge {"_shards":{"total":80,"successful":80,"failed":0}}%
% curl -s http://${ES_SERVER}/_cat/indices\?v health status index uuid pri rep docs.count docs.deleted store.size pri.store.size green open origin HthJj42xT5uO7w3Aoxzppw 80 0 868684 3454 1gb 1gb green close origin-backup-20210209-1736 P1CKjXW0QiWM5zlzX46-fg 80 0 green open origin-v0.5.0 SGplSaqPR_O9cPYU4ZsmdQ 80 0 868121 0 987.7mb 987.7mb green open origin-toremove PL7WEs3FTJSQy4dgGIwpeQ 80 0 868610 0 987.5mb 987.5mb
It will be probably something to schedule regularly on production index if size matters
The journal clients recovered, so the index is up-to-date.
Let's check some point before closing :
- The index size looks huge (~10g) compared to before the deployment
- it seems some document have no origin_visit_type populated as they should :
swh=> select * from origin where url='deb://Debian/packages/node-response-time'; id | url -------+------------------------------------------ 15552 | deb://Debian/packages/node-response-time (1 row)
Feb 18 2021
- Copy the backup of the index done in T2780
- delete current index
indexed:
"swh.journal.indexed.origin_intrinsic_metadata",0,15044088
stop the journal clients and swh-search
root@search0:~# puppet agent --disable "swh-search upgrade" root@search0:~# systemctl stop swh-search-journal-client@objects.service root@search0:~# systemctl stop swh-search-journal-client@indexed.service root@search0:~# systemctl stop gunicorn-swh-search.service
update the packages
root@search0:~# apt update && apt list --upgradable ... python3-swh.search/unknown 0.6.0-1~swh1~bpo10+1 all [upgradable from: 0.5.0-1~swh1~bpo10+1] ...
The dashboard was moved to the system directory: the new url is https://grafana.softwareheritage.org/goto/uBHBojEGz
swh-search:v0.5.0 deployed in all the environments, the metrics are correctly gathered by prometheus.
Let's create a real dashboard now [1]
This is the mapping of the origin index with the metadata : P953
Thanks @anlambert, the monitoring comes back to green
Feb 17 2021
lgtm
(forgot to mention: Thanks, it's really a nice improvement)
Please note a recent version of docker-compose is needed to allow the environment to start with the healthcheck keywork,
It was not working with the version 1.26.2 but is ok with 1.28.2
❯ docker-compose version docker-compose version 1.26.2, build eefe0d31 docker-py version: 4.2.2 CPython version: 3.7.7 OpenSSL version: OpenSSL 1.1.0l 10 Sep 2019
With tcpdump, it seems swh-web don't add the headers to don't cache the response in case of a 404:
GET /api/1/vault/directory/a317baff051f68e83557d51e59539dac2ff55b34/ HTTP/1.1 Host: archive.softwareheritage.org User-Agent: python-requests/2.21.0 Accept: */* X-Forwarded-For: 128.93.166.14 X-Forwarded-Proto: https Accept-Encoding: gzip X-Varnish: 230399
after digging, it seems the request with a 404 return code are cached by varnish.
When the test is launched, a first request is done which returns a 404 and the post is issued. When the check try to get the status of the cooking, the initial 404 is returned by varnish
It seems the scheduler has missed some updates. After an upgrade of the python3-swh-.* packages, the error is again the initial one.
After an upgrade of the packages on pergamon and vangogh, the error is now :
Feb 17 10:49:38 vangogh python3[1990225]: 2021-02-17 10:49:38 [1990225] root:ERROR <RemoteException 500 InvalidDatetimeFormat: ['invalid input syntax for type timestamp with time zone: "Timestamp(seconds=1613558977, nanoseconds=999614000)"\nCONTEXT: COPY tmp_task, line 1, column next_run: "Timestamp(seconds=1613558977, nanoseconds=999614000)"\n']> Traceback (most recent call last): File "/usr/lib/python3/dist-packages/swh/core/api/asynchronous.py", line 71, in middleware_handler return await handler(request) File "/usr/lib/python3/dist-packages/swh/core/api/asynchronous.py", line 178, in decorated_meth result = obj_meth(**kw) File "/usr/lib/python3/dist-packages/swh/core/db/common.py", line 62, in _meth return meth(self, *args, db=db, cur=cur, **kwargs) File "/usr/lib/python3/dist-packages/swh/vault/backend.py", line 220, in cook self.create_task(obj_type, obj_id, sticky) File "/usr/lib/python3/dist-packages/swh/core/db/common.py", line 62, in _meth return meth(self, *args, db=db, cur=cur, **kwargs) File "/usr/lib/python3/dist-packages/swh/vault/backend.py", line 163, in create_task task_id = self._send_task(obj_type, hex_id) File "/usr/lib/python3/dist-packages/swh/vault/backend.py", line 139, in _send_task added_tasks = self.scheduler.create_tasks([task]) File "/usr/lib/python3/dist-packages/swh/core/api/__init__.py", line 181, in meth_ return self.post(meth._endpoint_path, post_data) File "/usr/lib/python3/dist-packages/swh/core/api/__init__.py", line 278, in post return self._decode_response(response) File "/usr/lib/python3/dist-packages/swh/core/api/__init__.py", line 354, in _decode_response self.raise_for_status(response) File "/usr/lib/python3/dist-packages/swh/core/api/__init__.py", line 344, in raise_for_status raise exception from None swh.core.api.RemoteException: <RemoteException 500 InvalidDatetimeFormat: ['invalid input syntax for type timestamp with time zone: "Timestamp(seconds=1613558977, nanoseconds=999614000)"\nCONTEXT: COPY tmp_task, line 1, column next_run: "Timestamp(seconds=1613558977, nanoseconds=999614000)"\n']>
Feb 16 2021
I wrote a proposal for the next steps [1] so we could start the work on these counters. All comments/contributions are welcome.
root@0a15636c2914:/# aptitude install cassandra openjdk-11-jre The following NEW packages will be installed: alsa-topology-conf{a} alsa-ucm-conf{a} at-spi2-core{a} ca-certificates-java{a} cassandra dbus{a} file{a} fontconfig-config{a} fonts-dejavu-core{a} fonts-dejavu-extra{a} java-common{a} libapparmor1{a} libasound2{a} libasound2-data{a} libatk-bridge2.0-0{a} libatk-wrapper-java{a} libatk-wrapper-java-jni{a} libatk1.0-0{a} libatk1.0-data{a} libatspi2.0-0{a} libavahi-client3{a} libavahi-common-data{a} libavahi-common3{a} libbsd0{a} libcap2{a} libcups2{a} libdbus-1-3{a} libdrm-amdgpu1{a} libdrm-common{a} libdrm-intel1{a} libdrm-nouveau2{a} libdrm-radeon1{a} libdrm2{a} libedit2{a} libelf1{a} libevent-core-2.1-7{a} libevent-pthreads-2.1-7{a} libexpat1{a} libfontconfig1{a} libfontenc1{a} libfreetype6{a} libgif7{a} libgl1{a} libgl1-mesa-dri{a} libglapi-mesa{a} libglib2.0-0{a} libglib2.0-data{a} libglvnd0{a} libglx-mesa0{a} libglx0{a} libgraphite2-3{a} libharfbuzz0b{a} libice6{a} libicu67{a} libjpeg62-turbo{a} liblcms2-2{a} libllvm11{a} libmagic-mgc{a} libmagic1{a} libmd0{a} libnspr4{a} libnss3{a} libopts25{a} libpciaccess0{a} libpcsclite1{a} libpng16-16{a} libpython2-stdlib{a} libpython2.7-minimal{a} libpython2.7-stdlib{a} libsensors-config{a} libsensors5{a} libsm6{a} libvulkan1{a} libwayland-client0{a} libx11-6{a} libx11-data{a} libx11-xcb1{a} libxau6{a} libxaw7{a} libxcb-dri2-0{a} libxcb-dri3-0{a} libxcb-glx0{a} libxcb-present0{a} libxcb-randr0{a} libxcb-shape0{a} libxcb-shm0{a} libxcb-sync1{a} libxcb-xfixes0{a} libxcb1{a} libxcomposite1{a} libxdamage1{a} libxdmcp6{a} libxext6{a} libxfixes3{a} libxft2{a} libxi6{a} libxinerama1{a} libxkbfile1{a} libxml2{a} libxmu6{a} libxmuu1{a} libxpm4{a} libxrandr2{a} libxrender1{a} libxshmfence1{a} libxt6{a} libxtst6{a} libxv1{a} libxxf86dga1{a} libxxf86vm1{a} libz3-4{a} mailcap{a} media-types{a} mesa-vulkan-drivers{a} mime-support{a} ntp{a} openjdk-11-jre openjdk-11-jre-headless{a} python-is-python2{a} python2{a} python2-minimal{a} python2.7{a} python2.7-minimal{a} shared-mime-info{a} sntp{a} ucf{a} x11-common{a} x11-utils{a} xdg-user-dirs{a}
root@0a15636c2914:/# aptitude install cassandra The following NEW packages will be installed: adwaita-icon-theme{a} alsa-topology-conf{a} alsa-ucm-conf{a} at-spi2-core{a} ca-certificates-java{a} cassandra dbus{a} file{a} fontconfig{a} fontconfig-config{a} fonts-dejavu-core{a} fonts-dejavu-extra{a} gtk-update-icon-cache{a} hicolor-icon-theme{a} java-common{a} libapparmor1{a} libasound2{a} libasound2-data{a} libatk-bridge2.0-0{a} libatk-wrapper-java{a} libatk-wrapper-java-jni{a} libatk1.0-0{a} libatk1.0-data{a} libatspi2.0-0{a} libavahi-client3{a} libavahi-common-data{a} libavahi-common3{a} libbsd0{a} libcairo-gobject2{a} libcairo2{a} libcap2{a} libcups2{a} libdatrie1{a} libdbus-1-3{a} libdeflate0{a} libdrm-amdgpu1{a} libdrm-common{a} libdrm-intel1{a} libdrm-nouveau2{a} libdrm-radeon1{a} libdrm2{a} libedit2{a} libelf1{a} libevent-core-2.1-7{a} libevent-pthreads-2.1-7{a} libexpat1{a} libfontconfig1{a} libfontenc1{a} libfreetype6{a} libfribidi0{a} libgail-common{a} libgail18{a} libgdk-pixbuf-2.0-0{a} libgdk-pixbuf2.0-bin{a} libgdk-pixbuf2.0-common{a} libgif7{a} libgl1{a} libgl1-mesa-dri{a} libglapi-mesa{a} libglib2.0-0{a} libglib2.0-data{a} libglvnd0{a} libglx-mesa0{a} libglx0{a} libgraphite2-3{a} libgtk2.0-0{a} libgtk2.0-bin{a} libgtk2.0-common{a} libharfbuzz0b{a} libice6{a} libicu67{a} libjbig0{a} libjpeg62-turbo{a} liblcms2-2{a} libllvm11{a} libmagic-mgc{a} libmagic1{a} libmd0{a} libnspr4{a} libnss3{a} libopts25{a} libpango-1.0-0{a} libpangocairo-1.0-0{a} libpangoft2-1.0-0{a} libpciaccess0{a} libpcsclite1{a} libpixman-1-0{a} libpng16-16{a} libpython2-stdlib{a} libpython2.7-minimal{a} libpython2.7-stdlib{a} librsvg2-2{a} librsvg2-common{a} libsensors-config{a} libsensors5{a} libsm6{a} libthai-data{a} libthai0{a} libtiff5{a} libvulkan1{a} libwayland-client0{a} libwebp6{a} libx11-6{a} libx11-data{a} libx11-xcb1{a} libxau6{a} libxaw7{a} libxcb-dri2-0{a} libxcb-dri3-0{a} libxcb-glx0{a} libxcb-present0{a} libxcb-randr0{a} libxcb-render0{a} libxcb-shape0{a} libxcb-shm0{a} libxcb-sync1{a} libxcb-xfixes0{a} libxcb1{a} libxcomposite1{a} libxcursor1{a} libxdamage1{a} libxdmcp6{a} libxext6{a} libxfixes3{a} libxft2{a} libxi6{a} libxinerama1{a} libxkbfile1{a} libxml2{a} libxmu6{a} libxmuu1{a} libxpm4{a} libxrandr2{a} libxrender1{a} libxshmfence1{a} libxt6{a} libxtst6{a} libxv1{a} libxxf86dga1{a} libxxf86vm1{a} libz3-4{a} mailcap{a} media-types{a} mesa-vulkan-drivers{a} mime-support{a} ntp{a} openjdk-17-jre{a} openjdk-17-jre-headless{a} python-is-python2{a} python2{a} python2-minimal{a} python2.7{a} python2.7-minimal{a} shared-mime-info{a} sntp{a} ucf{a} x11-common{a} x11-utils{a} xdg-user-dirs{a} 0 packages upgraded, 159 newly installed, 0 to remove and 12 not upgraded.
Feb 15 2021
Adapt according the review