{"id":780,"date":"2023-10-19T11:46:57","date_gmt":"2023-10-19T09:46:57","guid":{"rendered":"http:\/\/eines-informatiques.recursos.uoc.edu\/introduccion-a-los-entornos-de-trabajo-gnu-linux\/?page_id=780"},"modified":"2025-02-11T00:38:47","modified_gmt":"2025-02-10T22:38:47","slug":"1-17-2-descarrega-i-exploracio-del-genoma-huma","status":"publish","type":"page","link":"http:\/\/eines-informatiques.recursos.uoc.edu\/introduccion-a-los-entornos-de-trabajo-gnu-linux\/1-17-2-descarrega-i-exploracio-del-genoma-huma\/","title":{"rendered":"1.17.2. Desc\u00e0rrega i exploraci\u00f3 del genoma hum\u00e0"},"content":{"rendered":"<p>L\u2019organitzaci\u00f3 del genoma d\u2019un organisme es dona en un conjunt de cromosomes. En aquest exemple, es procedeix a descarregar l\u2019anotaci\u00f3 sobre els cromosomes del genoma hum\u00e0 en la seva distribuci\u00f3 <em>hg38<\/em>. S\u2019adjunta la taula 18 amb els accessos de desc\u00e0rrega que s\u2019utilitzaran.<\/p>\n<div class=\"tabletitle\"><p>Taula 18. P\u00e0gines web del navegador gen\u00f2mic UCSC.<\/p>\n<\/div>\n<table width=\"603\">\n<tbody>\n<tr class=\"table-header\">\n<td width=\"238\">Acc\u00e9s<\/td>\n<td width=\"365\">Direcci\u00f3<\/td>\n<\/tr>\n<tr>\n<td width=\"238\">P\u00e0gina principal servidor UCSC<\/td>\n<td width=\"365\"><a href=\"http:\/\/genome.ucsc.edu\/\" target=\"_blank\" rel=\"noopener\">http:\/\/genome.ucsc.edu\/<\/a><\/td>\n<\/tr>\n<tr>\n<td width=\"238\">P\u00e0gina desc\u00e0rregues (genoma data)<\/td>\n<td width=\"365\"><a href=\"https:\/\/hgdownload.soe.ucsc.edu\/downloads.html\" target=\"_blank\" rel=\"noopener\">https:\/\/hgdownload.soe.ucsc.edu\/downloads.html<\/a><\/td>\n<\/tr>\n<tr>\n<td width=\"238\">P\u00e0gina llistat esp\u00e8cies<\/td>\n<td width=\"365\"><a href=\"https:\/\/hgdownload.soe.ucsc.edu\/goldenPath\/currentGenomes\/\" target=\"_blank\" rel=\"noopener\">https:\/\/hgdownload.soe.ucsc.edu\/goldenPath\/currentGenomes\/<\/a><\/td>\n<\/tr>\n<tr>\n<td width=\"238\">P\u00e0gina acc\u00e9s esp\u00e8cie <em>human<\/em><\/td>\n<td width=\"365\"><a href=\"https:\/\/hgdownload.soe.ucsc.edu\/downloads.html#human\" target=\"_blank\" rel=\"noopener\">https:\/\/hgdownload.soe.ucsc.edu\/downloads.html#human<\/a><\/td>\n<\/tr>\n<tr>\n<td width=\"238\">P\u00e0gina acc\u00e9s esp\u00e8cie <em>human<\/em><\/td>\n<td width=\"365\"><a href=\"https:\/\/hgdownload.soe.ucsc.edu\/goldenPath\/hg38\/\" target=\"_blank\" rel=\"noopener\">https:\/\/hgdownload.soe.ucsc.edu\/goldenPath\/hg38\/<\/a><\/td>\n<\/tr>\n<tr>\n<td width=\"238\">P\u00e0gina <em>Sequence data by Chromosome<\/em><\/td>\n<td width=\"365\"><a href=\"https:\/\/hgdownload.soe.ucsc.edu\/goldenPath\/hg38\/chromosomes\/\" target=\"_blank\" rel=\"noopener\">https:\/\/hgdownload.soe.ucsc.edu\/goldenPath\/hg38\/chromosomes\/<\/a><\/td>\n<\/tr>\n<tr>\n<td width=\"238\">P\u00e0gina acc\u00e9s <em>bigZips<\/em><\/td>\n<td width=\"365\"><a href=\"https:\/\/hgdownload.soe.ucsc.edu\/goldenPath\/hg38\/bigZips\/\" target=\"_blank\" rel=\"noopener\">https:\/\/hgdownload.soe.ucsc.edu\/goldenPath\/hg38\/bigZips\/<\/a><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<div class=\"tablefooter\"><p>Font: elaboraci\u00f3 pr\u00f2pia.<\/p>\n<\/div>\n<p>Primerament, s\u2019accedeix a la p\u00e0gina de desc\u00e0rregues (en angl\u00e8s, <em>downloads<\/em>) del navegador gen\u00f2mic UCSC (<a href=\"https:\/\/hgdownload.soe.ucsc.edu\/downloads.html\" target=\"_blank\" rel=\"noopener\">https:\/\/hgdownload.soe.ucsc.edu\/downloads.html<\/a>).<\/p>\n<p>El contingut d\u2019aquesta p\u00e0gina ens mostra el llistat de genomes disponibles organitzat per esp\u00e8cies. A data de 26 d\u2019abril de 2023 hi ha informaci\u00f3 sobre 108 esp\u00e8cies.<\/p>\n<p>En fer \u00fas de l\u2019enlla\u00e7 <em>Human<\/em>, entrem a la secci\u00f3 dedicada al genoma hum\u00e0. \u00c9s important destacar que la informaci\u00f3 corresponent a cada genoma s\u2019actualitza amb certa freq\u00fc\u00e8ncia, per la qual cosa cada millora substancial compta amb un codi de versi\u00f3 propi. En aquest cas, treballarem amb la distribuci\u00f3 coneguda com <em>a hg38<\/em>, la qual \u00e9s la m\u00e9s recent al moment de la redacci\u00f3 d\u2019aquests materials.<\/p>\n<p>Si des de la p\u00e0gina d\u2019acc\u00e9s a l\u2019esp\u00e8cie humana accediu a l\u2019enlla\u00e7 associat a <em>Sequence data by chromosome<\/em> (<a href=\"https:\/\/hgdownload.soe.ucsc.edu\/goldenPath\/hg38\/chromosomes\" target=\"_blank\" rel=\"noopener\">https:\/\/hgdownload.soe.ucsc.edu\/goldenPath\/hg38\/chromosomes<\/a>) accedireu al llistat dels fitxers comprimits FASTA de cadascun dels cromosomes (chr*.fa.gz), a la seq\u00fc\u00e8ncies <em>random<\/em>, que s\u00f3n seq\u00fc\u00e8ncies no col\u00b7locades en els anteriors cromosomes de refer\u00e8ncia (chr*_random), i a les seq\u00fc\u00e8ncies <em>chrUn_*<\/em>, que s\u00f3n seq\u00fc\u00e8ncies no localitzades en les quals el cromosoma de refer\u00e8ncia no ha estat determinat. En la mateixa data esmentada anteriorment, hi ha 456 seq\u00fc\u00e8ncies FASTA associades a diferents cromosomes.<\/p>\n<p>Si des d\u2019aquesta \u00faltima localitzaci\u00f3 s\u2019accedeix al <em>Parent Directory<\/em> (<a href=\"https:\/\/hgdownload.soe.ucsc.edu\/goldenPath\/hg38\/\" target=\"_blank\" rel=\"noopener\">https:\/\/hgdownload.soe.ucsc.edu\/goldenPath\/hg38\/<\/a>), trobareu el directori anomenat <em>bigZips<\/em> (https:\/\/hgdownload.soe.ucsc.edu\/goldenPath\/hg38\/bigZips\/), que \u00e9s un altre repositori de fitxers, amb diferents formats, associat al genoma hum\u00e0. Tots els arxius estan comprimits i empaquetats per reduir el temps de transmissi\u00f3.<\/p>\n<p>El fitxer hg38.chromFa.tar.gz cont\u00e9 la seq\u00fc\u00e8ncia original dels cromosomes separats en arxius independents. Cal descarregar aquest fitxer i es far\u00e0 amb l\u2019ordre <em>wget, <\/em>per\u00f2 nom\u00e9s has de descarregar el fitxer si tens m\u00e9s de 5 Gb disponibles al disc dur. Si tens menys de 5 Gb lliures, descarrega la seq\u00fc\u00e8ncia FASTA del cromosoma 7 des del directori <a href=\"https:\/\/hgdownload.soe.ucsc.edu\/goldenPath\/hg38\/chromosomes\/\" target=\"_blank\" rel=\"noopener\">https:\/\/hgdownload.soe.ucsc.edu\/goldenPath\/hg38\/chromosomes\/<\/a><\/p>\n<p># L\u2019ordre <code>df<\/code> \u00e9s l\u2019ordre que s\u2019utilitza per esbrinar l\u2019espai en disc<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"bash\" data-enlighter-theme=\"mowtwo\">$ df -h<\/pre>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"raw\" data-enlighter-theme=\"droide\">Filesystem\u00a0\u00a0\u00a0\u00a0Size\u00a0\u00a0\u00a0\u00a0\u00a0Used \u00a0\u00a0 Available \u00a0\u00a0Use% \u00a0\u00a0 Mounted on\r\n\r\nudev\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0959M\u00a0\u00a0\u00a0\u00a0 0\u00a0\u00a0 \u00a0\u00a0\u00a0\u00a0959M\u00a0\u00a0      0% \u00a0\u00a0\u00a0\u00a0\u00a0\/dev\r\n\r\ntmpfs\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0199M\u00a0    1,4M\u00a0\u00a0  197M\u00a0\u00a0 \u00a0\u00a0\u00a0\u00a0 1% \u00a0\u00a0\u00a0\u00a0\u00a0\/run\r\n\r\n\/dev\/sda5\u00a0\u00a0\u00a0\u00a0\u00a020G\u00a0\u00a0 \u00a0\u00a0\u00a016G\u00a0\u00a0 \u00a0\u00a02,9G\u00a0 \u00a0\u00a0\u00a0\u00a0\u00a0\u00a085% \u00a0\u00a0\u00a0\u00a0\/\r\n\r\ntmpfs\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0991M\u00a0\u00a0\u00a0\u00a0 0\u00a0\u00a0 \u00a0\u00a0\u00a0\u00a0991M\u00a0\u00a0      0% \u00a0\u00a0\u00a0\u00a0\u00a0\/dev\/shm\r\n\r\ntmpfs\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a05,0M\u00a0 \u00a0  4,0K\u00a0\u00a0 \u00a05,0M\u00a0\u00a0 \u00a0\u00a0\u00a0\u00a0\u00a01% \u00a0\u00a0\u00a0\u00a0\u00a0\/run\/lock\r\n\r\ntmpfs\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0991M\u00a0\u00a0\u00a0\u00a0 0\u00a0\u00a0 \u00a0\u00a0\u00a0\u00a0991M\u00a0\u00a0      0% \u00a0\u00a0\u00a0\u00a0\u00a0\/sys\/fs\/cgroup\r\n\r\n\/dev\/loop1\u00a0\u00a0\u00a0\u00a064M\u00a0\u00a0 \u00a0\u00a0 64M\u00a0\u00a0\u00a0\u00a0 0 \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0100% \u00a0\u00a0 \/snap\/core20\/1852<\/pre>\n<p>En la m\u00e0quina en la qual s\u2019est\u00e0 treballant nom\u00e9s hi ha disponibles 2,9 G d\u2019espai (columna <em>Available<\/em>), per la qual cosa en aquest cas nom\u00e9s es descarrega la seq\u00fc\u00e8ncia FASTA del cromosoma 7. Si hi hagu\u00e9s espai en el disc dur per descarregar el fitxer amb tota la informaci\u00f3, el procediment seria el seg\u00fcent:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"bash\" data-enlighter-theme=\"mowtwo\">$ wget https:\/\/hgdownload.soe.ucsc.edu\/goldenPath\/hg38\/bigZips\/hg38.chromFa.tar.gz<\/pre>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"raw\" data-enlighter-theme=\"droide\">--2023-04-26 13:22:55--\u00a0 https:\/\/hgdownload.soe.ucsc.edu\/goldenPath\/hg38\/bigZips\/hg38.chromFa.tar.gz\r\n\r\nS'est\u00e0 resolent hgdownload.soe.ucsc.edu (hgdownload.soe.ucsc.edu)... 128.114.119.163\r\n\r\nS'est\u00e0 connectant a hgdownload.soe.ucsc.edu (hgdownload.soe.ucsc.edu)|128.114.119.163|:443... conectat.\r\n\r\nHTTP: s'ha enviat la petici\u00f3, s'est\u00e0 esperant una resposta... 200 OK\r\n\r\nMida: 983726049 (938M) [application\/x-gzip]\r\n\r\nS'est\u00e0 desant a: \u00abhg38.chromFa.tar.gz\u00bb\r\n\r\nhg38.chromFa.tar.gz\u00a0\u00a0\u00a0 100%[===================================&gt;] 938,15M\u00a0 7,82MB\/s\u00a0\u00a0\u00a0 in 2m 25s\u00a0\r\n\r\n2023-04-26 13:25:22 (6,47 MB\/s) - s'ha desat \u00abhg38.chromFa.tar.gz\u00bb [983726049\/983726049]<\/pre>\n<p>Un cop el fitxer est\u00e0 descarregat a la m\u00e0quina Gnu\/Linux amb la qual es treballi s\u2019ha de desempaquetar i descomprimir el fitxer amb l\u2019objectiu de visualitzar-ne el contingut.<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"bash\" data-enlighter-theme=\"mowtwo\">$ ls -alh hg38.chromFa.tar.gz<\/pre>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"raw\" data-enlighter-theme=\"droide\"> -rw-rw-r-- 1 student student 939M de gen.\u00a0 24\u00a0 2014 hg38.chromFa.tar.gz<\/pre>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"bash\" data-enlighter-theme=\"mowtwo\"> $ tar -vzxf hg38.chromFa.tar.gz<\/pre>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"raw\" data-enlighter-theme=\"droide\">.\/chroms\/\r\n\r\n.\/chroms\/chr1.fa\r\n\r\n.\/chroms\/chr10.fa\r\n\r\n.\/chroms\/chr11.fa\r\n\r\n.\/chroms\/chr11_KI270721v1_random.fa\r\n\r\n.\/chroms\/chr12.fa\r\n\r\n.\/chroms\/chr13.fa\r\n\r\n\u2026<\/pre>\n<p>Tot i que la qualitat de la seq\u00fc\u00e8ncia del genoma hum\u00e0 \u00e9s acceptable, encara es troba en fase de millora. A causa d\u2019aix\u00f2, \u00e9s com\u00fa trobar nombrosos arxius que contenen fragments o variants que encara estan en discussi\u00f3 i que no necess\u00e0riament corresponen a un cromosoma complet. \u00c9s possible visualitzar el primer cromosoma al terminal; no obstant aix\u00f2, en algunes parts del cromosoma, com l\u2019inici, la seq\u00fc\u00e8ncia de nucle\u00f2tids \u00e9s desconeguda i es denota amb el car\u00e0cter <em>N<\/em>. A m\u00e9s, per indicar la pres\u00e8ncia d\u2019elements codificats en la seq\u00fc\u00e8ncia, es pot utilitzar una combinaci\u00f3 de lletres maj\u00fascules i min\u00fascules.<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"bash\" data-enlighter-theme=\"mowtwo\">$ more chr7.fa<\/pre>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"raw\" data-enlighter-theme=\"droide\">NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN\r\n\r\nNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN\r\n\r\n\u2026\r\n\r\n\u2026\r\n\r\nNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN\r\n\r\nNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN\r\n\r\nGAATTCTACATTAGAAAAATAAACCATAGCCTCATCACAGGCACTTAAAT\r\n\r\nACACTGAAGCTGCCAAAACAATCTATCGTTTTGCCTACGTACTTATCAAC\r\n\r\nTTCCTCATAGCAAACTGGGAGAAAAAAGCAATGGAATGAATAAAATGATA\r\n\r\nGCCACAAAAATCAAGGTGGGAGAAATACTTATTATATGTCCATAAAAAAT\r\n\r\nTTTAATTAATGCAAAGTATTAACACCAATGATTGCAGTAATACAGATCTT\r\n\r\nACAAATGATAGTTTTAGTCTGAACAGGACTATCCAAAAGTTAATTTTCTA\r\n\r\nTAGTAACAGTTTTTAAATAAAATATCAATTCCTGAAACACATAAAATGGT\r\n\r\nCCATGAGTATACAACGAGTGAAAAAAAACAAATTCAGAGCAAAGATAAAT\r\n\r\nTAAGAAGTATCTAATATTCAAACATAGTCAAAGAGAGGGAGATTTCTGGA\r\n\r\nTAATCACTTAAGCCCATGGTTAAACATAAATGCAAATATGTTAATGTTTA\r\n\r\nCTGAATAACTTATCTGTGCCAAGTGGTGTATTAATGATTCATTTTTATTT\r\n\r\nTTCACTAAATCTTTTCTCTAAAGTTGGTGTAGCCTGCAACTAAATGCAAG\r\n\r\nAAATCTGACCTAGGACCTGCACTTCTTACCATTTTGCTCATATTTATTCC\r\n\r\nCTGTGCATTTTTGTAACATGTATATGTTATATATATAGAAAGAGAGAGAG\r\n\r\nGCAGAGATGGAAAGTAATTTATGGAGTTTGATGTTATGTCAGGGTAATTA\r\n\r\nCATGATTATATAATTAACAGGTTTCTTTTTAAATCAGCTATATCAATAGA\r\n\r\nAAAATAAATGTAGGAATCAAGAGACTCATTCTGTCCATCTGTGATAGTTC\r\n\r\nCATCATGATACTGCATTGTCAAGTCATTGCTCCAAAAATATGGTTTAGCT\r\n\r\nCAACactgagtgactataggaaaccagaaaccaggctgggcgctaaagat\r\n\r\ngcaaagatgaatgagacatcatctctgccgtccaaaagcttactgtctag\r\n\r\ntgggagagttacacacgtaaggacagtaatctaataagagctaataagtg\r\n\r\naaaactaagataaattaataatacaagattacagggaaggtttccaaagt\r\n\r\ncaatgaggcctcaaatgaatcttgaaagtgtgcaaggattaaccaaatga<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>L\u2019organitzaci\u00f3 del genoma d\u2019un organisme es dona en un conjunt de cromosomes. En aquest exemple, es procedeix a descarregar l\u2019anotaci\u00f3 sobre els cromosomes del genoma hum\u00e0 en la seva distribuci\u00f3 hg38. S\u2019adjunta la taula 18 amb els accessos de desc\u00e0rrega que s\u2019utilitzaran. Acc\u00e9s Direcci\u00f3 P\u00e0gina principal servidor UCSC http:\/\/genome.ucsc.edu\/ P\u00e0gina desc\u00e0rregues (genoma data) https:\/\/hgdownload.soe.ucsc.edu\/downloads.html P\u00e0gina [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":[],"acf":[],"_links":{"self":[{"href":"http:\/\/eines-informatiques.recursos.uoc.edu\/introduccion-a-los-entornos-de-trabajo-gnu-linux\/wp-json\/wp\/v2\/pages\/780"}],"collection":[{"href":"http:\/\/eines-informatiques.recursos.uoc.edu\/introduccion-a-los-entornos-de-trabajo-gnu-linux\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"http:\/\/eines-informatiques.recursos.uoc.edu\/introduccion-a-los-entornos-de-trabajo-gnu-linux\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"http:\/\/eines-informatiques.recursos.uoc.edu\/introduccion-a-los-entornos-de-trabajo-gnu-linux\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/eines-informatiques.recursos.uoc.edu\/introduccion-a-los-entornos-de-trabajo-gnu-linux\/wp-json\/wp\/v2\/comments?post=780"}],"version-history":[{"count":8,"href":"http:\/\/eines-informatiques.recursos.uoc.edu\/introduccion-a-los-entornos-de-trabajo-gnu-linux\/wp-json\/wp\/v2\/pages\/780\/revisions"}],"predecessor-version":[{"id":1264,"href":"http:\/\/eines-informatiques.recursos.uoc.edu\/introduccion-a-los-entornos-de-trabajo-gnu-linux\/wp-json\/wp\/v2\/pages\/780\/revisions\/1264"}],"wp:attachment":[{"href":"http:\/\/eines-informatiques.recursos.uoc.edu\/introduccion-a-los-entornos-de-trabajo-gnu-linux\/wp-json\/wp\/v2\/media?parent=780"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}